68-9 


Proceedings of the American Academy of Arts and Sciences. 


VoL. 68. No. 9.—Avucaust, 1933. 





LEAST SQUARES AND LAWS OF POPULATION GROWTH. 


By Epwin B. WILson AND Rutu R. PUFFER. 


Department of Vital Statistics, Harvard School of Public Health. 





(Continued from page 3 of cover.) 


VOLUME 68 


BRIDGMAN, P. W.—The Pressure-Volume-Temperature Relations of Fifteen 
Liquids. pp. 1-25. March, 1933. $0.60. 

BRIDGMAN, P. W.—Compressibilities and Pressure Coefficients of Resistance 
of Elements, Compounds, and Alloys, Many of Them Anomalous. pp 27-93. 
March, 1933. $1.20 

BRIDGMAN, P. W.—The Effect of Pressure on the Electrical Resistance of Single 
Metal Crystals at Low Temperature. pp. 95-123. March, 1933. $0.75 

Urry, Wm. D.—Radio Activity Measurements. I The Radium Content of 
the Keweenawan Basalts and Some Accessory Minerals. II. The Occurrence 
of Radium, Uranium and Potassium in the Earth. pp. 125-144. March. 
1933. $0.50. 

Smitu, Lyman B.—1. Studies in the Bromeliaceae.—lIV. pp. 145-151. W. E. 
BROADWAY AND L. B. Smitu.—2. The Bromeliaceae of Trinidad and Tobago. 
pp. 152-188. April, 1933. $0.90. 

HeIpEL, W. A.—A Suggestion Concerning Plato’s Atlantis pp. 189-228. May, 
1933. $0.85. 

BARNETT, S. J.—Gyromagnetic Experiments on the Process of Magnetization 
in Weak Fields. pp. 229-249. June, 1933 $0.65. 

Lewis, Freperic T.—The Significance of Cells as Revealed by their Polyhedral 
Shapes, with special Reference to Precartilage, and a Surmise Concerning Nerve 
Cells and Neuroglia. pp. 251-286. June, 1933. $0.85. 

Witson, Epwin B. anp Purrer, Rutu R.—Least Squares and Laws of Popu- 
lation Growth. pp. 285-382. August, 1933. $1.75. 





enn ely 


ee 











WS eg NS ee Oe ™ My 

















sf: ; . gon” - r= 8 & ; 5 $ 


- ‘ * > : 7 pRerichncat eye Diwan <0 : { ; 


: : : ‘ ’ : ' : R 2 em + = od =< “ 




















Proceedings of the American Academy of Arts and Sciences. 


VoL. 68. No. 9.—Avuaust, 1933. 





LEAST SQUARES AND LAWS OF POPULATION GROWTH. 


By Epwin B. WILSON AND RutH R. PUFFER. 


Department of Vital Statistics, Harvard School of Public Health. 














LEAST SQUARES AND LAWS OF POPULATION GROWTH.! 


By Epwin B. WILSON AND RutH R. PUFFER. 


Department of Vital Statistics, Harvard School of Public Health. 


Received June 10, 1933. Presented April 12, 1933. 
TABLE OF CONTENTS. 

I. GENERAL INTRODUCTORY DISCUSSION. Page 
oo ge dw AO de ke ROA REDD 287 
2. The finite and the differential equation................... 287 
3. Integration constants and constants of Nature............. 288 
laa g cael 288 
5. The exhaustion of available land........................ 289 
EE ee ene 290 
7. Indeterminability of the Natural Constants............... 290 
8. Populations not always increasing.....................4.. 292 
a os ea akinebs Wa hae eeen 292 

10. Discontinuities in rate of growth. ....................05. 293 
EE Re 294 
ee ea is Sk Oy Wale cin dae he ae ae a 294 
ee ha aay ging ane eens 295 
Il. FirrinGc THE LoaIsTIC. 
EE eee ee rer eer 296 
1S. Eebeiiaity of Gee Gommtants. . . cic wccceecs 297 
16. Cases of stability of the constants ...................... 298 
i7. Eaietion infinite im Gite time......... 2... ccc ences 298 
18. Necessity of definite methods of fitting................... 299 
19. Conditions for a least squares fit of a logistic.............. 299 
20. Other conditions that might be applied................... 300 
21. Least squares by relative residuals.....................-. 301 
22. Least squares with weights. ........... 0.2 c cece cc cnceces 302 
23. What weights should be used?.................... cee: 302 
24. Fitting the Malthusian; first approximation............... 303 
25. The Malthusian; second approximation................... 304 
26. Approximation equations for the Malthusian.............. 304 
27. Modified approximation equations. ....................4.. 305 
28. May one modify the approximation equations?.......:.... 307 
29. The standard errors of the constants..................... 308 
eC cs vcedae de bsanbeweneenees 308 
31. The approximation equations for the logistic.............. 309 
32. Manner of judging convergence to the solution. ........... 310 
33. Illustration of convergence for a Malthusian.............. 312 
34. Discussion of the convergence................0ccecceeees 314 





286 WILSON AND PUFFER 


35. The quadratic expansion at minimum.................... 315 
36. England and Wales, 1801-1911.......................... 315 
EE 316 
ee 317 
39. Connecticut, New Jersey, New York and the Three States, 
I Scala oe ee ee we ei ng ake Gl bie ed ae 318 
40. Reed and Pearl on Connecticut......................005. 319 
41. Reed and Pearl on Additivity......................22.065 319 
42. The Three States by relative residuals, 1790—-1920.......... 320 
43. New York City and Environs, 1790—-1920................. 321 
44. A method which might replace least squares............... 322 
65. Apptiontion of this MOGROG. . .. oo. nc ccc cece veeces 323 
II. THE AUGMENTED LOGISTIC. 
EE EE PETITE EE OT TET ee. 
47. The conditions for a minimum; the imaginary case......... 325 
48. Convergence in the imaginary case....................... 325 
49. The augmented critical and the imaginary case............ 326 
Ne tue Gh eee hake eR AAR Owe ON 328 
51. Germany, 1816-1855 and 1861-1910. .................... 329 
52. England and Wales, 1801-1911......................22.. 330 
53. Development about the minimum........................ 381 
54. Connecticut and the Three States, 1790-1920............. 333 
55. New York City and Environs, 1790-1920................. 334 
EE EE 
57. The weights of the observations......................... 336 
a ee 338 


IV. SUMMARY AND CONCLUSIONS. 


ES ET Pe ee ee ee ee ee ere 309 
EE a ee er ee ee? 340 
No oO ed ee big ais i 342 
V. APPENDIX. NOTES ON LEAST SQUARES. 
EE I a 343 
63. Higher moments and Edgeworth’s theorem................ 344 
64. Usual expression for variance....................020eeeee 346 
ee 347 
a a oe us ee Oat s.d wee RES 349 
67. The case of k unknowns with weights.................... 350 
ee eee 351 
EEE EE POSE ET ee re ee 302 
es i a ard wd oe oa EA ee 354 
EE eee ee ee re 307 
ee 359 


ie a 360 











LEAST SQUARES AND LAWS OF POPULATION GROWTH 287 


I. GENERAL INTRODUCTORY DISCUSSION. 


1. The Malthusian Law. Although various empirical expressions 
have been used to fit census data,’ few of them have been dignified 
with the name of “laws” of growth and few have been considered to 
have any rational basis. The simplest rational law which has been 
proposed is the so-called Malthusian Law* that population tends to 
increase in geometric ratio, the rate of increase being proportional to 
the population as dP/dt = nP or P = Ce™, where n is the survival 
rate. This law may be rationalized by pointing out that under con- 
stant conditions the birth rate and the death rate should remain con- 
stant and consequently also the survival rate which is their difference. 
The law is irrational in two respects: First, with indefinite increase of 
time the population increases indefinitely, as is manifestly impossible. 
Second, although the populations of two different districts (such as 
an urban and its suburban region), being mere enumerations, must 
add to form the population of the combined metropolitan district, 
P, = Cye™ and P, = Coe™ do not add to give an expression of the 
form P = Ce™ unless n; and nz are equal. The first of these difficul- 
ties was much emphasized by Malthus and has been recently stressed 
by the neo-Malthusians; the second stares any practical person in the 
face if he tries to discuss the growth of partial and combined popula- 
tions by the device of plotting on semi-log paper. 

2. The Finite and the Differential Equation. In discussing laws of 
growth, and indeed laws of any kind, it is convenient to distinguish 
between the finite equation which expresses the phenomenon as a 
function of the time and the differential equation which results from 
the elimination of some of the parameters in the finite equation. 
Thus if from P = Ce™ the constant C be eliminated by differentiation, 
the differential equation dP/dt = nP results, and its integration rein- 
troduces the constant C or its equivalent. The constant C is deter- 
mined for any particular population by using the observed value of 
P, say Po, at a given time f) and noting that Py) = Ce” must hold, so 


that C = Poe" and P = Pye”. It is possible also to eliminate 
the constant » by a second differentiation to obtain the differential 
equation 
d dlog P 0 @P 1fdP\ P 1) 
a od Gd P\a) = \ 


To determine the two constants which arise in the integration of this 
equation we should need to know from observation not only the value 








288 WILSON AND PUFFER 


Po of P at the time ft) but also the value of dP/dt at that time—or 
some equivalent conditions, such as the values of P at two specified 
times. If consideration be had for the fact that statistical fluctua- 
tions from any theoretical law must be expected it may be preferred 
to determine both the constants by some method of curve fitting 
which uses all the observed values of P and t. 

3. Integration Constants and Constants of Nature. In most prob- 
lems involving natural law one does not seek to eliminate all the con- 
stants or parameters which enter into the finite equation but one 
divides the constants into two categories, one the group of disposable 
constants which will be determined empirically from the observations 
and one the group of natural constants which refer to intrinsic proper- 
ties of the system studied; indeed some have gone so far as to state 
that the search for natural law is a search for constants of nature. 
In the Malthusian law P = Ce"!, C is indubitably of the nature of a 
constant of integration arising from the intrinsic differential equation 
dP/dt = nP, or arising from the finite equation without recourse 
to the differential equation by virtue of the consideration that for 
the law P = Ce" a change of the origin of time which is arbitrary 
changes the value of C; but n could well be regarded as a natural 
constant, namely, the natural survival rate of a species,’ to be de- 
termined once for all and not to be fittable separately in each case, 
and on this hypothesis we should not proceed to the differential 
equation (1). 

4. The Logistic. Shortly after Malthus’ time Verhulst® proposed 
that the effect of the pressure of the population on the food supply 
and the reaction upon the survival rate of the population might be 
expressed by the formula dP/dt = nP — ¢ (P), where ¢ is some func- 
tion of P increasing with P and he gave especial attention to the 
simple case’ ¢(P) = naP? so that’ 

dP l 


acne ee 2 I= 2 
7H n(P —aP*) or I a bent (2) 





Thus he defined a curve which he called the logistic. Here n is the 
survival rate when P is small, i. e., before there is perceptible pressure 
of the population on the food supply, and a is the reciprocal of the 
limiting population, and 6 is a constant of integration. This law has 
been proposed again by Pearl and Reed who have fitted it to a goodly 
number of populations.? They have considered all three constants 
a, b, n to be disposable in each particular case, thus leaving no place 








LEAST SQUARES AND LAWS OF POPULATION GROWTH 289 


for natural constants in the population problem. As they have been 
interested largely in forecasting population growths and in empha- 
sizing the impossibility of an unlimited population they have natur- 
ally laid their chief emphasis on the interpretation of the constant 
1/a = K which is the value to which the population tends asymptoti- 
cally in infinite time. Yule,’ however, has treated the problem 
mainly for the light it throws on the constant n, which measures the 
natural survival rate in an unlimited environment and for the course 
it would determine for the actual survival rate which is n — aP. 
That Yule finds practically the same value of n for England (.160) 
and for France (.183) which have had such different economic situa- 
tions and have been in such different phases of their growth cycle is 
perhaps some evidence that there is a natural survival rate for man 
of about n = .17 corresponding to a tendency to double the popula- 
tion (in unlimited environment) in about four decades. 

5. The Exhaustion of Available Land. If we are to think in very 
general biological terms of the problem of the growth of human or 
other (animal or plant) populations we should expect to find two 
natural constants involved, first the natural species survival rate n 
and second the saturation density of population. If A be the area 
over which the population ranges and « be the area just necessary on 
the average to support the life of one individual of the species, the 
limiting stationary population must be K = 1/a = A/z. (Kast!? 
estimates « = 2.5 acres of tillable land as the amount necessary to 
support one human being.) The differential equation could then be 


written 
UP/A P P\: WD 
d( =n S—a(4) | or — = nD(1 — aD) (3) 





dt A A dt 


and express the rate of increase of density of population as an auto- 
catalytic equation in the density with the two natural constants n 
and a of which the second is the reciprocal of the saturation density. 
It is interesting, though perhaps not very significant, that the equation 
may equally well be written as 


“(4-2)=-2($-2) (4) 


stating that the per capita area of range available at any time in 
excess of the necessary minimum « diminishes geometrically at the 
rate n. In this form the constant » appears not as a survival rate 











290 WILSON AND PUFFER 


for the population but as a rate of exhaustion of land per capita 
available above the necessary minimum. 

6. Available Land in the U. S. A. It may be remarked parentheti- 
cally that growth curves are applied empirically not alone to popula- 
tions but to anything that grows such as automobile production or 
body weight. To illustrate with a matter germane to the immediate 
discussion of saturation density (or its reciprocal) as a natural con- 
stant in the population problem, we may ask what are the prospects 
for the “acreage of improved land in farms in the United States” as 
indicated by fitting the logistic to the census figures of such acreage" 
in millions, namely, 
ee 1850 1860 1870 1880 1890 1900 1910 1920 
Acreage....... 113 163 189 285 308 414 478 503 


It will be noted that the rate of increase dropped greatly between 
1860 and 1870 and again between 1910 and 1920. If the figure for 
1920 be given much weight, it will bring the asymptotic value down 
under 600, but disregarding 1920 and 1870 we estimate graphically 
an asymptotic value between 620 and 640 (see plot on growth paper"). 
East estimated 800 as the most generous value which could be assigned 
to the acreage (in millions) tillable in the United States. To get so 
large a figure he allowed only 360 million acres for forest and wood- 
land, or 19%, whereas France, as he points out, retains 25% of her 
area in forest and Germany even more; our present figure is 600 or 
32%. If we retain ultimately something in excess of 25% of our 
area in forest and woodland, we should have to increase East’s figure 
of 360 to 480+, and decrease his figure of 800 for tillable land to 
680—. As he purposely stretched all his estimates in favor of in- 
creasing the available improved area in farms, our graphical indica- 
tion of an asymptotic value around 630 million acres is corroborative 
of his calculation and is a value he might accept as entirely fair. 
This acreage might support 250 million persons at a Japanese or 
Chinese standard of living but scarcely more than 160 million on the 
American standard® (which would allow 4 acres per capita instead of 
the minimum of 2.5 and instead of our present value of about 5). 

7. Indeterminability of the Natural Constants. Although from a 
theoretical point of view one may be very desirous of having some 
constants of nature in the formulation of any natural problem, and 
may sincerely hope that at some future time, one may so treat n and 
z% as above adumbrated, there are from the practical point of view 
great present difficulties in assigning values to such constants partly 











LEAST SQUARES AND LAWS OF POPULATION GROWTH 291 


because of their dependence upon standards of living which some 
students of population problems regard as likely to determine the 





THE LOGISTIC GRID. Ordinates: Percentage of Saturation 


Limits 1- 2 2- 8 8-16 16-30 30-50 percent 
Limits 99-98 98-92 92-84 84-70 70-50 percent 
Intervals 0.25 0.5 1.0 2.0 2.5 percent 


limiting population more effectively than any value of the minimum 
area necessary barely to support life, partly because of the complica- 
tions due to trade which make it difficult to determine the area from 











292 WILSON AND PUFFER 


which a population really derives its sustenance (the British may live 
in England but surely not off it!), and partly because of complications 
due to movements of population which have occurred both interna- 
tionally and intranationally during the periods for which the censuses 
of population are available. It is at present a far cry from the problem 
of describing the growth of various populations to that of constructing 
a theory of population growth for humans or for species seriously 
upset by human activity in taking up and modifying the aspect of 
the land. Hence it is practically a necessity to regard the functional 
forms adopted for fitting curves to population data as empirical ex- 
pressions and a justification for treating all the constants as disposable 
in the fitting. 

8. Populations Not Always Increasing. One may go further and 
doubt whether there is any general rational law of population growth. 
To give expectation that there be such a law we should have to 
postulate that conditions governing the growth remain constant 
during at least that length of time which interests us in a practical 
way for the problem at hand. A study of the history of populations 
in the past shows that populations in considerable districts not only 
rise but fall. A noteworthy case of fall in recent times is found in 
Ireland which reached a maximum population of a bit above eight 
million about 1841 and has subsequently steadily declined to about 
four million. Studies by Beloch on populations of ancient and 
medieval times indicate that the numbers in the region occupied by 
the Roman Empire at its greatest extension were presumably greatest 
around 300 A. D. (estimated at 100 million) and that when again 
the figure was reached for the region as a whole the distribution of the 
population as between Europe, Asia and Africa was quite different. 
He further estimates that Egypt contained about 8 million persons 
under Nero and did not subsequently reach that figure until within 
recent decades. We are familiar with the fall as well as rise and with 
the alternating rise and fall of populations in the animal kingdom. 
It is far from certain that a part of a rational population law for 
humans would be a steady approach to an asymptote as indicated by 
the logistic. 

9. The Law of Superposition. If we wish to proceed to set up the 
possibilities under the most general terms we meet little encourage- 
ment. The population forecaster who desires to treat districts 
which may be combined into larger districts would desire a strictly 
additive law. This would mean that after the elimination by differ- 

















LEAST SQUARES AND LAWS OF POPULATION GROWTH 293 


entiation of all the truly fittable (as distinguished from the natural) 
constants he would desire a linear differential equation to express 
intrinsically the law of growth of population. For most practical 
cases it would be ‘necessary to assume that there was no natural 
origin of time, 1. e., that the change of ¢ into ¢t + ¢ in the finite equation 
should merely change the values of the fittable constants (as is the 
case with the logistic) and in the differential equation should make no 
change at all. These conditions seem equivalent to assuming a 
linear differential equation with constant coefficients. Such equa- 
tions are familiar in applied mathematics and are often specifically 
stated as a consequence of the “law of superposition” (or additivity) ; 
they have as solutions sums of exponential or trigonometric terms in 
the time. No such expression seems to be satisfactory as a general 
law of population growth—it seems qualitatively wrong over any 
sufficiently extended period of time. 

10. Discontinuities in Rate of Growth. Thus we are pretty surely 
thrown back on empiricism and the question is whether any law is 
reasonably satisfactory over a reasonable range of time and for a 
reasonably large number of cases. Knibbs'® has maintained that the 
census facts show rather clearly that populations do not grow at a con- 
tinuously diminishing rate as assumed by Verhulst, but at a constant 
rate for a period and then at another constant rate for a subsequent 
period, and so on. This is well illustrated by the case of Sweden, 
which appears to have had three periods of Malthusian growth;'’ 
first from 1750 to 1800 at the rate n = .052, second from 1810 to 1860 
at the rate n = .099, third from 1870 to 1920 with n = .068. It 
should, however, be remarked that even if populations do grow ac- 
cording to the Malthusian law with values of » changing from time 
to time so that the semi-log plot is a polygon, one may be justified in 
fitting a logistic or other smooth curve as a graduation of the data 
provided the time interval over which the smooth curve holds is 
long compared with the intervals of time over which the individual 
rates of Malthusian growth are maintained. The forecasting value 
of such a smooth curve might, however, be much less than if the 
growth were really along the curve. Thus in the period 1750 to 1870 
Sweden showed on the whole an acceleration rather than retardation 
of the rate of growth and a logistic fitted to this period would become 
infinite in finite time. On the other hand from 1820 to 1920 Sweden 
showed a retarded growth rate and the logistic would give a finite 
asymptote. So Connecticut is fitted nicely by two Malthusians,'’ 





294 WILSON AND PUFFER 


one from 1790 to 1840 with 2 = .055, the other from 1840 to 1930 
with » = .188 indicating an accelerated growth over the whole 
period for which censuses are available. 

11. Limitations of the Logistic. As the logistic has been used so 
freely recently as a law of growth and as it was very early proposed 
as a modification of the Malthusian law, we shall close this general 
discussion by a few comments upon it, some of which have been 
alluded to in the descriptive material above. The logistic (2) is a 
three-constant generalization of the two-constant Malthusian. It 
has the same difficulty that it is not additive, becomes in fact additive’ 
only when the natural growth rates n are the same and the ratios 
a:barealsothe same. As the rational assignment of values to n and 
to a is practically impossible, the curve is fitted by treating all three 
constants as disposable in adjusting the curve to the observed popula- 
tions. Under these conditions we are really considering that the law 
of population growth satisfies the differential equation obtained by 
eliminating the three constants, namely, 


ae ] d i — () 
de" \ dP) 9) 


Furthermore it then becomes impossible to regard the logistic as 
symbolizing a retarded growth rate resulting in a finite saturated value 
for the population because through the process of fitting we may find 
opposite values for the constants @ and 6 which will result mathe- 
matically in forecasting an infinite population at the finite time t = 
— {log (— a/b)|/n. This would not be a significant difficulty were 
it not for the fact that this situation actually arises in not a few cases 
when attempts are made to fit the logistic to actual data.'!® In such 
instances although the logistic, having an extra constant, will fit the 
observations better than the Malthusian, and may (or may not) fore- 
cast better the populations which will be observed at a few censuses 
following the last one available for fitting, it surely will give a long 
time forecast which is worse than that given by the Malthusian. 

12. Types of Logistic. To bring the matter clearly into the light 
we may classify logistics into five types on the basis of fitting them to 
three equally spaced observations by the usual method. Let the 
value of P be Po, P:1, P2, let time be measured from the first observa- 
tion, and let r = e~"! where ¢ is the interval of time between the 
observations; then 


/Ppo=atb, 1/P} =at+ br, 1/P. = at br’. 











LEAST SQUARES AND LAWS OF POPULATION GROWTH 295 


These equations will have a definite solution for a, 6, r, namely, 


1/P; — 1/Pe 1/Py « 1/P. — (1/P)? 
1/Po — 1/P,? 9 1/Po+1/P2 — 2/P, ’ 

(1/P1 — 1/Po)? 
~ 1/Po+ 1/P2 — 2/Pi 











b (6) 
with one exception, namely, when the populations advance in har- 
monic progression with 1/P,) — 1/P; = 1/Pi — 1/P2. In this excep- 
tional case r = 1 anda= + and b = + & and the logistic be- 
comes indeterminate. This will be, called the critical case. The five 
types are 


1. Hypomalthusian (with 2 = ae 2 l I 
ee ee ay ee A eet ieee 
2 l 0 


_— sad — 


An's? A 


saturation), \ P, P, P a ae 
‘ oe 7 iy. ft 4 a 
2. Malthusian (no saturation), P,) =P. P, and P, < Pt P. 
3. Hypermalthusian (no rw. 2 7 2 l ] 
saturation), (5) > PD,’ P, and P, " Pt P> 
- ; 1 \? ] ] 2 
4. Critical (no saturation), P, = P, Pz and od ae 
1 ) 1 i ra 


5. Hypercritical (no saturation), ( 


Four of the five cases indicate no saturation, three of them are hyper- 
malthusian. 

13. The Critical Logistic. In the critical or indeterminate case, it 
is not the curve which is indeterminate as a geometrical object but 
the constants in the logistic form. There is no difficulty in taking 
the limit, as r approaches 1 and a becomes infinite, in such a manner 
as to show that the limiting curve has the form 

l 


I ‘ 
p= %- Bi or ene (7) 


(| oe (= x) 
at ae? ee 


which becomes infinite at f = «/@. It may be remarked that as 


with 








296 WILSON AND PUFFER 


ae 4% 1/P; — 1/Po 
win t  ¢ ©\1/P,—1/P, 








the interpretation of n as the rate of growth when the population is 
small is equally possible in the first three cases. In the critical case 
n becomes zero, which may still be interpreted; but in the hyper- 
critical case n becomes negative. The following table gives a numeri- 
cal example. The reason for the discussion of the critical case is 
partly because population data have been found which seem to be 
critical if not hypercritical and partly because of curve fitting diffi- 
culties which arise in that case and to which we have to allude (Art. 





























48). 
'Pol Pi | Po a b r nt 
Hypermalthusian | 1 | 1.5 | 2.5 —2/3 | 5/3 4/5 . 223 
” l 1.5 | 2.97 —32 33 98/99 .0102 
Critical 111.6138 Fo*] +0* l 0 
Hypercritical ] 1.5 | 3.038 | 104/3 —101/3 102/101 — .0099 
“ i: 2.8128 10/3 | — 7/3 8/7 — 134 





*a = 1 and, assuming ¢ = 1, 8 = 1/3. Thus 1/P = 1 — 2/3. 
Il. FirrinG THE LoaistTIc. 

14. Semple Methods of Fitting. We have given the formulas (6) 
for finding a, b, r = e~" for the logistic from three values of P sep- 
arated in time by equal amounts ?¢, the origin of time being at the 
first of the three observations. Naturally, in practice one would 
space the three points well apart. If the number of censuses, supposed 
themselves equally spaced, were odd we could use the first, middle and 
last censuses or the second, middle and next to the last. If we wished 
to throw more weight into the later censuses as likely to be more 
determinative of the subsequent course of population we could ignore 
the first few altogether. If statistical fluctuations seemed to be 
rendering the individual points inadequately reliable we could use a 
graphical method of trial based on growth paper, or in the absence of 
such paper we could assume various values of the constant a and plot 
on semi-log paper 1/P — a to ascertain what value of a seemed to 
vive the best straight line 


logio 1/P — a) = logwb — nilogwe-: t (8) 


and by this formula we could equally well make the study on ordinary 








LEAST SQUARES AND LAWS OF POPULATION GROWTH 297 


plotting paper. Another method would be to follow Yule’s suggestion® 
of summation of the reciprocals of the populations in equal equally 
spaced groups of /, the formulas being 


1 1 — r* l l-—r 
en ie >, — = ka —_ 
Pp kat b>, 2D ka + br ; “ 








s 
—! 


k 





l 
35 = kat br’? 
3D ka + br , 


where the initial times of the groups are p units t apart. These equa- 
tions with 





a’ =ka, b'’ =b , r=r? (9) 


are identical in form with those for three points and thus permit the 
use of (6) to find a’, e’, r’ from which a, b, r immediately follow. 

15. Instability of the Constants. The difficulties into which one 
comes when trying to fit a logistic by these methods may be illus- 
trated by taking the case of New York City (present area). The 
census figures are in millions:?° 


For 1850 1860 1870 1880 1890 1900 1910 1920 
P=0.696 1.175 1.478 1.912 2.507 3.437 4.767 5.620 


If now we fit to 1860, 1890, 1920 taking three points as far apart as 
possible we get K = 1/a = — 30 (millions) for limiting population 
showing that the logistic is hypermalthusian with no indication of 
saturation. If we take 1870, 1890, 1910, we find A = — 5 and again 
hypermalthusian growth. If we take 1850, 1880, 1910 we find A = 
21.5 showing hypomalthusianism but a well-nigh incredibly large 
asymptotic value,—and why favor 1850, 1880, 1910 as over against 
1860, 1890, 1920 for forecasting? If we apply Yule’s method of sum- 
mation (9) using the three pairs 1870-80, 1890-1900, 1910-20 we find 
K = — 38 again indicating no saturation. By the same method 
using 1860-70, 1880-90, 1900-10 we find A = — 4.4. Most of the 
estimates therefore fail to give any evidence of saturation, and 
furthermore the value of A varies all the way from 21 through infinity 
to — 4. If we discard all the earlier data and use 1900, 1910, 1920 
we find K = 6.4, which was already exceeded in 1980. The value of 
K found by the graphical method (8) using all the points would de- 
pend largely on the judgment of the person using that method.*! In 











298 WILSON AND PUFFER 


other words these methods as a whole fail to give consistent answers 
and thus leave the problem of fitting unsolved. 

16. Cases of Stability of the Constants. In the case of the United 
States the growth of population has followed the logistic with extra- 
ordinary fidelity” so that almost any method of fitting will lead to 
approximately the same results for the constants. It may not be 
amiss to mention that the United States has been far from a fixed 
area since 1790 and that as a matter of fact most of its present area 
was unsettled and even held under foreign sovereignties at the time 
of the earlier censuses, and further that we have had a large immigra- 
tion.” It is a little difficult to see why, despite these accessions of 
area and of population from without, our growth should be so regu- 
larly according to law unless that law be rather a law of the call of the 
land than a law of population growth based on reproduction. Yule’s 
discussion of England and of France shows reasonable stability in the 
constants for those countries. To cite England as an illustration we 
may point out that using the three censuses 1811, 1861, 1911 we find 
K = 108 (million); Yule by three methods finds K = 91, 97, and 100. 
Such consistency indicates that the constant A is actually mathe- 
matically determinate; whether it implies any comparable accuracy 
as a forecast may be doubted. But the instability shown above in 
the case of New York City is by no means rare. 

17. Logistics Infinite in Finite Time. It may be worth while to 
tabulate the values of the constants obtained by various methods for 
a case or two. Take Connecticut and New Jersey and the censuses 
from 1790 to 1920; we find (using millions as the unit, as we shall 
systematically), 

















Connecticut New Jersey 
a | b n a b n 
la —5.11 9.44 .0370 — .343 6.02 .170 
lb 11.96 — 7.76 — .0296 —1.158 6.59 121 
2a —2.19 7.31 .0719 — .253 7.08 .199 
2b 36.54 — 32.11 — .0090 —1.082 6.85 133 
3 —1.64 6.64 .0797 — .026 8.17 245 

















(The origin of time affects the value of b, only, and is uniformly 1790 
in this table. The time unit is, as always in this paper, the decade.) 
The first values la are based on the three censuses 1800, 1860, 1920; 
the second set 1b on 1790, 1850, 1910. The third set 2a is figured by 
Yule’s summation method applied to the groups 1810-1840, 1850- 














LEAST SQUARES AND LAWS OF POPULATION GROWTH 299 


1880, 1890-1920, the fourth (2b) is based on the groups 1800-1830, 
1840-1870, 1880-1910. In the fifth line (3) are given the values de- 
rived by minimizing the sum of the squares of the deviations between 
the observed census figures and the logistic (Art. 39). None of the 
methods applied to either State show saturation, the logistics become 
infinite in finite time. The values of K = 1/a are highly variable 
as well as non-significant. Even the constant n which gives the rate 
of growth when the population is small has no stability for Connecti- 
cut, and not much for New Jersey. It is to be noted that for Connecti- 
cut the methods 15 and 2b based on omitting the last census (1920) 
give such acceleration of growth as to make the logistics hypercritical 
(a>0,b<0,n <0). 

18. Necessity of Definite Methods of Fitting. If to the indetermina- 
tion resulting from specified arithmetical methods of fitting there be 
added the possibilities for variation which arise from graphical methods 
it becomes clear that the logistic which any one person may get must 
often be largely arbitrary and personal. By a sufficient exercise of 
wishful thinking one may even obtain a hypomalthusian logistic show- 
ing saturation for a series of populations which by practically every 
test show a hypermalthusian growth. The fitting of the curve be- 
comes an art in which one gets what he wants, and the values fore- 
cast from the curve become personal estimates with perhaps no greater 
validity than would attach to estimates obtained in any other way 
and probably less validity than would attach to estimates based on an 
analysis of the age distribution, of deaths, and of fecundity.% For 
scientific purposes we must have a method of fitting which is agreed 
upon and the results of which can be duplicated by anybody. For 
such purposes one naturally thinks of the method of least squares, 
whether or no this method should logically be applied to such time 
series as arise in problems of growth. Pearl and Reed have advocated 
this method and have claimed to practice it® 2% *! 34, There are inter- 
esting mathematical problems which arise in the application of the 
method to logistics, and we shall turn our attention to some of them. 

19. Conditions for a Least Squares Fit of a Logistic. In 1806 Legen- 
dre stated the principle that the most satisfactory solution of the 
problem of determining the unknowns from supernumerary equations 
is to render the sum of the squares of the “errors” or deviations a 
minimum.” In the case in hand this means that such values of the 
parameters a, b, and n of the logistic shall be chosen as will minimize 
the function 





WILSON AND PUFFER 





~ I ; ' , 
S(a,b,n) = es ( Hi — pike =) = D,(E; — P;) (10) 
where £; represent the actual observed enumerations at the times ¢; 
and where P; are the values computed from the logistic.26 The con- 
ditions for a minimum are?’ 


OP; 


—_—— v _ >. 
0 2(E;— P;) AG 


= L(E; nae P;)P?, (lla) 


_-“ wr _ p\p/(K  D. . 
= pp U (Ei — Pi)P({K — Pi), (1b) 


7 OP; _- ; 

rile 0= — XL(E;—- P3) _— KE P;)P(K — P,)ti, (1c) 
where K = 1/a. These conditions are immediately interpretable. 
They state that we must have the sum of the residuals LF; — P; equal 
- to zero when those residuals are weighted,”* before summing, in three 
different ways (1) by the squares of the calculated populations P;, 
(2) by the product of the populations and their differences from the 
asymptotic population K = 1/a, and (3) by the continued product 
of the times and of the populations and of their differences from their 
asymptotic value.”? 

20. Other Conditions that Might be Applied. Such conditions are, 
of course, different from those to be used if we were to fit by the 
method of moments which would require that the zeroth, first and 
second moments of the residuals with respect to the time should 
vanish, viz., 


LE; ae P;) = Q), L(E; aed P;)t; <— 0, X(E;— P;)t? — 0, (12) 


and which are the conditions that would arise in fitting the parabolic 
form P = a+ bt-+ ct? by least squares.*° They are also different 
from the conditions which correspond to Pearl’s solution of the prob- 
lem of fitting the logistic, with a relatively simple computation form.” 
His equations xxxvi-xxxviii are equivalent to the conditions 


E-P E-P E-P | 
=r —— =0, | - )=o DIE(K — P)(# 5") <0 (13) 


whereas, by combining the first two of our correct conditions (11) we 
have for direct comparison the equivalent conditions 











LEAST SQUARES AND LAWS OF POPULATION GROWTH 301 


SP(E—P) =0, SP(E—P)=0, SPUK—P)(E—P)=0 (14) 


It will be noticed that his equations are similar to ours except that 
they have been divided respectively by P?, P?/E and P*/E, the values 
of P/E being very nearly equal to those of P?. Thus his fit makes 
the mean residual vanish when weighted by the reciprocal of the 
computed population or by that reciprocal multiplied by the observed 
population F or by the continued product of that reciprocal squared 
and by the time and by E and by the difference between the asymp- 
totic population and P. As Schultz*? has pointed out Pearl’s least 
squares solution is not a least squares solution. 

21. Least Squares by Relative Residuals. If we should ask what 
would be the least squares solution under the condition of minimizing 
the sum of the relative residuals we should have to minimize 


E-P*\* E-P\ 
(a) s=2/ RE ) or ) s=2/ Pp ) (15a, b) 








The conditions for the first are 
P P? 
2 Re (E— P)=0, 2 Ip tb — P)= 0, , 
a Pt(kK — P) 
oes | De 





(Ek — P)=0 (16a) 
and for the second 


EO VE 
Y—(E— P)=0, S5(E- P) =0, 


» - P 
~ Ht(k — P) 


_— P 





(E— P)=0 (16b) 


These two solutions will be slightly different and both of them will 
be slightly different from Pearl’s but in so far as P and EF may be 
treated as equivalent in weighting the residuals prior to summing,”® 
all three solutions will be the same. Indeed as 


, ee a P ue 

PPR and i 
it is clear that the weights in Pearl’s first condition (13) are inter- 
mediate between those in the first conditions in the least squares 
solution by relative residuals taken the two ways. The weights in 
his second and third conditions are identical with those of the least 








302 WILSON AND PUFFER 


squares fit by relative residuals based on P. It is therefore to be pre- 
sumed that though Pearl’s fit is not a least squares fit it is very close 
to a least squares fit by relative (not absolute) residuals based on P 
but shaded a little toward that based on F. 

22. Least Squares With Weights. We could weight the observations. 
The primal conception of a weight is that of a repeated observation. 
An observation of weight w is the equivalent of that observation re- 
peated w times. We should therefore naturally define the problem of 
least squares with weights as that of minimizing 


S(a,b,n) = Lywi(k; — P;)? (17) 













inserting (F; — P;) into the sum S as many times as its weight w; 
indicates. When the weights do not involve the unknowns, as is 
usual, and when the expression P is linear in the unknowns, as is also 
usual, the conditions for minimizing Lw(k — P)? lead to the usual 
normal equations for least squares solutions with weights.** The 
question might fairly be raised as to whether the problem of minimiz- 
ing S as written is the proper definition of the problem of determining 
the least squares fit when the expression P is non-linear in the con- 
stants and the weights do not involve them. The answer would be 
indubitably affirmative. But the answer is not so sure when the 
weights involve the unknown constants. The differentiation of S 
would then yield equations like 
y J 
: = LY ww, (E; _ P;) . 
Oa 




























Q 


= Ow; - 
T 5 Vil Ei — P,)* > = 0 (18) 
- Oa 





to] 
= | 


a 


as those for the minimum. We might prefer not to differentiate the 
weights. We shall not attempt here to determine just what the 
formulation of problem of fitting by least squares should be when 
the weights involve the constants; we shall simply assume that when 
we say least squares we mean that the sum of the squares is to be 
least, and leave the critical discussion to the Appendix. The problem 
of fitting by relative differences (15a) can be stated as minimizing the 
sum of the squares of the absolute residuals weighted inversely as the 
squares of the observed populations, and by analogy we would inter- 
pret (15b) as minimizing the sum of the squares of the absolute re- 
siduals weighted inversely as the squares of the computed populations 
and we differentiate with respect to the parameters wherever they 
occur as we did in deriving (16b). 

23. What Weights Should be Used? The question of the assignment 























LEAST SQUARES AND LAWS OF POPULATION GROWTH 303 


of weights is always troublesome. It ought to be discussed for the 
problem of fitting growth curves. Is a variation of 1 lb. in weight of 
a baby to be treated like that of a variation of 1 Ib. in the adult in 
fitting a growth curve to the body weight? Is a deviation from the 
logistic of 100,000 when the population is half a million to be con- 
sidered as equivalent to the same deviation when the population is 
ten million? Shall we minimize the sum of the squares of the absolute 
residuals or the sum of the squares of the relative residuals® or some- 
thing else? In frequency function problems where y? is an appropriate 
test of goodness of fit we do not minimize the sum of the squares of the 
absolute or relative residuals but the sum of the squares of the absolute 
residuals divided by the calculated value,** i. e., &(4 — P)?/P. Here 
we have from the sampling theory an indication of what form ¥? 
should take. In some problems we know that variations of the same 
proportionate amounts are equally significant and then we minimize 
the sum of the squares of the relative residuals. We know of no theo- 
retical or empirical considerations which show what weights should 
be applied in fitting logistics to population figures; we shall fit them 
mostly without weights by minimizing the sum of the squares of the 
absolute errors but give a few cases of fitting by minimizing the sum 
of the squares of the relative errors for purposes of comparison.*’ 

24. Fitting the Malthusian; First Approximation. As this matter of 
fitting with and without weights is of importance and as the formulas 
and calculations with the logistic are unnecessarily long for the pur- 
pose of illustrating principles and methods, we shall choose first the 
simple problem, proposed to us by a distinguished astronomer and of 
some importance in itself, of fitting the two-constant Malthusian 
P = Ce"!, a special case of the logistic with a = 0, to a series of values 
E so as to minimize the sum of the squares of the errors. The condi- 
tions to minimize 


S = X(E — Ce")? are Y(E— P)P=0 and 
L(E— P)tP=0 (19) 
If we consider the example 


t{=m—j], £E=0.5; ¢=0, £=2.5; t=1, £ = 3.875 


we see that with P = 1, 2, 4, equations (19) are satisfied and that con- 
sequently P = 2 X 2¢ or 2e%%?* and C = 2, n = log 2 = .6931. 
The value of S is 0.515625. The usual method of getting a first ap- 
proximation to the solution, and one which not infrequently is good 











304 WILSON AND PUFFER 


enough for practical purposes is to take logarithms and fit the linear 
expression®® log P = nt-+ y, where y = log C, to the values log FE, 
il. e., to minimize 


S = (log E— nt — vy)? by Zilog E — log P) = 0, 
X(log E — log P)t = 0 (20) 


Owing to the linearity this may be done at one stroke. For the ex- 
ample above the method gives 


P = 1.6920 e!-%384°) 6 § = 1.3621 


and it is seen that the fit is bad. The value 1.692 for C is well re- 
moved from the true value C = 2 and the value n = 1.02384 is not 
near .6931. The value S = 1.3621 is more than two and a half times 
the true value .515625. Extrapolation would be extremely incorrect. 

25. The Malthusian; Second Approximation. Now as d log P = 
dP/P one may write dP = P log P and if we consider E — P as a 
sort of differential of P we should expect that minimizing &(E — P)? 
would be approximated by minimizing 


X(log E — log P)?P? or X(log EF — log P)*E?, (20) 


since FE and P are nearly equal. That is, in fitting logarithmically we 
should weight the observations as the squares of their values.*? The 
first form gives no advantage in calculation over minimizing &(E — P)? 
but the second form is quadratic in the unknowns y = log C and n, 
and leads to linear normal equations. The solution is 


P = 2.3087 e°**! with S = .7682 


and is much better, though by no means very good, nor safe for any 
considerable extrapolation. As, however, the actual solution of the 
problem of minimizing &(/ — P)? is that of solving the two equa- 
tions of condition 


X(E — Ce™)e"*=0 and L(E — Ce")te™* = 0 (19’) 


which are non-linear in the constant n (though, as a matter of fact, 
linear in C), so that the solution must proceed by successive approxi- 
mations, it is clearly to be desired that the start of the work be from 
values as nearly correct as may be, and hence this second solution 
should be preferred as a starting point. 

26. The Approximation Equations for the Malthusian. There is 
another item connected with the fitting of logistics by least squares 














LEAST SQUARES AND LAWS OF POPULATION GROWTH 305 


which may be illustrated by the simpler case of the Malthusian, and 
that is the derivation of the equations. We have 


1 2 
S(a, b, n) =i (x 2 a + bh =) 
e 


to minimize. We may expand S(a,b,n) about any set of values, 
which in practice should be near the minimum. In the simpler case 


he | a, aS 
S(C,n) = X(E — Cer)? = Sot (22) 3 6C + (<5 ) én 


1/oas 7S 07S 
+ — 24 ‘in + = i... & 
2 (= ) ~ (3 On ) oe (= ) - a) 


In the usual case of least squares where the constants enter linearly 
in P, S is a quadratic function of the constants and the expansion has 
no higher terms. The conditions for a minimum are*® 




















10S. 1288 1 as 
2 aC ' 2 aC? hia samen 
(22) 
12841 8 4 1885, 
2 an ' 2 aCan Py boo 


and can be solved definitively for 6C and 6n no matter what values are 
taken for Co, m,—and it is usual to take the values O for both so that 
we really solve for C and n directly. When the constants do not enter 
linearly in P, S is not quadratic but may be expanded into a series in 
the neighborhood of any values of the constants and if the terms 
higher than the second be neglected we have a quadratic which will 
represent the function in the neighborhood of those values. If the 
values are near enough to the minimum sought, the solution of the 
equations for 6C and 6n should give a better value,*\—and the pro- 
cess may be repeated. 

27. Modified Approximation Equations. Another method of reach- 
ing equations to be solved is to expand E — P before squaring 


: oP aP 1 &P 
s=2| £-P-Fc-Z in — > ee 


a2P 


— aCa 





6C” 








1 &P , 
i eo ’ (23) 
n 2 dn? 











306 WILSON AND PUFFER 


In the case where the constants enter P linearly there are no terms 
of the second or higher orders. If in the more complicated case we 
stop the expansion with the linear terms and square we have a quad- 
ratic in 6C, 6n which leads to the equations 


oP oP oP aP 
ase L(E - P)— + n( 2 ye 6éC+z— >, on = 0 


0c ac 0c a 
), 
SE — p +5 OP oP oP C+ oP o = (24) 
nie ) ~ aC dn On — 


and it is these equations which are ordinarily suggested for solution 
for 6C and 6n to obtain the next approximation.” It should be noted 
that when we square as above after rendering the expression kK — P 
linear in the variation of the parameters, we are really neglecting a 
part of the quadratic terms, namely, those arising from the product 
of the finite (though presumably small) term EF — P and the quadratic 
terms in the expansion of P. Really the quadratic expression in 6C 
and 6n is 


| . 6P oP oP oP . 
2 ppp 258 — P)( Fact een 4 3( Fac 2 in| 








eP o7P o?P 
S(EF — Pp 2 J « 2 
LE p) (SE ac + 2an oC in-+ Te > On? + int) 


That which is upon the first line leads to the equations (24) just 
written whereas the total expression gives 








oP oP \? oP 
— anies ola ) Y 
ulE pF +2|(%)-a 2 PSE | 
OP oP oP 
yy" ’ —_ ») _ 
vs ES an ls laa, dC dn | einai 


oP oP o7P 
om >> yy Y 
- PS on +3 Ee an he ee dC on ao. [8c 
= | — 
On? mn 


,| { oP 
+ 2 | (3*) —~@- 


These two approximation equations for 6C and 6n are, of course, 
different in form only from those (22) written for S but they do differ 
from (24). 





















LEAST SQUARES AND LAWS OF POPULATION GROWTH 307 


28. May One Modify the Approximation Equations? The question 
might be raised as to whether it is proper, in the sense that it leads to 
the right answer, to use the simpler equations (24) obtained by re- 
ducing the expression F — P to linear form before squaring rather than 
the more complicated ones. A simple illustration will show that 
there are sums of squares for which it is not proper. Consider 


S = (ay — 1)?+ («+ y) 


The minimum is found from 


1 aS 

3 ag Y — Vy t ty) =2y¥ + 1) =0 
1 aS . | 

re li (xy — lhe t+ (x + y) = y(x?+ 1) = 0 


Obviously there is just one solution (0, 0), which is a true minimum 
and the expansion of S about (0, 0) is 1+ 2?+ y2+ ---. If we now 
try to obtain this solution by (22) the equations are 


ay? + 1) + (y¥?+ 1)da+ 2xydy = 0 
y(a?-+ 1) + 2rydx + (a2?+ 1)dy = 0 








and 
, a(x? -+ 1)(1 — y?) , y(y? + 1)(1 — 2?) 
c= - 9 9 >.9 99 ai 9 9 ) 9.9 
1+ 2? yy’? — 327 i> o> fp ~ Ie 
If we start from x, y near (0, 0) we get a very rapid convergence to 
(0, 0) from equations which are practically 62 = — x and édy = — y. 


On the other hand if we expand before squaring, retaining only the 
linear terms, we have 


S' = (ay + yox + xdy — 1)?+ (a+ y+ bx+ dy)? 
= (xy — 1)? + (a+ y)? + 22(y? + 1)dx + 2y(a? + 1) dy 
+ (y+ 1)6a? + (2? + 1) dy? + 2(ry + 1) drdy 
The equations for 6x, dy are 
aly?+ 1) + (y+ 1)dx+ (ay + 1)éy = 0 
y(a? + 1) + (ry + 1) dx + (2? + 1)dy = 0 


a+ | y+ i 
1 —-, 2a 
y-2 oy 


and 








308 WILSON AND PUFFER 


from which no convergence to (0, 0) is possible. The function S’, 
quadratic in 6x, dy, has a minimum for any assigned values of x and y 
provided x and y are not equal, yet that minimum gives no indication 
of the minimum of S. 

29. The Standard Errors of the Constants. Whether such a case can 
arise in a least squares solution to the problem of fitting a curve we 
do not know; we have never met such a case.* On the contrary we 
have found that in most of the cases in which we have tried compara- 
tively the two sets of equations (24) and (25), the simpler equations 
actually converge to the answer somewhat more quickly than the 
more complicated ones.* But one thing does appear reasonably clear, 
and that is that if we desire to discuss the standard errors of the un- 
knowns by using the ordinary formulas we should logically use the 
true quadratic expansion (25) of S about the minimum. The standard 
errors of C and n being respectively® 


/X(E — P)? , /48S/dn? 4/ S(E — P)? , /30°S/aC? 
Vir-2 Va” k—2 V H 
esas /( & 
AC? An? (7 an 
(It is understood that all the quantities are to be evaluated for those 
values of C and n which determine the minimum.) In evaluating the 
values (26) it is important not to reject any constants from the equa- 
tions (24) or (25) such as may have been rejected in discussing the 
conditions for the minimum.’° 

30. The Forms of the Logistic. In fitting the logistic the problem 
is to satisfy the three conditions (11) or their equivalent, no matter 
how. If in addition we desire the standard errors of the unknowns we 
have to be careful to carry the calculation in such fashion that we 
can apply the appropriate formulas*® analogous to (26). The form 
of the approximation equations which one will use depends somewhat 
on the form in which the logistic is written and of these there are 
several, 


























(26) 





where 4H = ) and k is the number of observations. 


l kK B l 
at be! a + me"! > gant +- 7. aa pint? 


The relations between the constants in the various cases can readily 
be obtained so that from the solution in any form that in any of the 
other forms can be obtained. There is one exception. As 6 = log b 
the last form cannot be used when 6 turns out to be negative, for 6 














etc. 














LEAST SQUARES AND LAWS OF POPULATION GROWTH 309 


would have to be imaginary and it is impossible to obtain an imaginary 
number through the convergence process set up for a solution by 
successive approximations on the assumption that the solution is real. 
As b is negative only in the hypercritical case there is not much danger 
of 8 being imaginary. If however, we write the second form as 
K K 
1 + me"! as 1 + cunt? uw = log m 

there is real danger of failing of a solution because of imaginary uy, 
for the ordinary hypermalthusian but subcritical case arises rather 
frequently and, in this, both K and m are negative, i. e., a is negative 
and 6 is positive. We have used consistently the first form which 
produces no discontinuity in the transition from the hypo- to the hyper- 
malthusian case; the second form requires that both A and m pass 
through in this transition; the third form is free from this difficulty .*7 
We started our calculations with the first form and have used it con- 
sistently throughout; but we incline to believe that the third form is 
somewhat to be preferred at least unless one needs the standard error 
of kK = B/ . 

31. The Approximation Equations for the Logistic. The expression 
to minimize is 


| : l . B ; 
§ = S(E— PP =3(E- tia. 27 
( I ) (1 a + bent ) (1 pant ( ) ( () 


then 























oP —_ Pp? oP — —nt pr OP _ ht —nt pe (28) 
da —_— —_— ii 
or 
> ] »?) ) 
oP _ P oP aes” ail oP _ t ent p2 (28’) 
OB B’ a B on B 
Hence 
1 0S 1 OS 
—_ a om p+ o —_. pP\ p2 ait _ bs . —_ P\ P2,—nt 
> aq (I P)P*, > ah (I Pye | 
1 as (29) 
i V(E — P)P®hte-* 
2 On (] P) Pht 
or 
1 Os a sy, Dy P) P 1 as an ae oD P) iis 
—_— "32°" =o | 
(29°) 


1 0S 





t 
in am Che «=n. mre —nt 


2 On 








310 WILSON AND PUFFER 



































1 0S = >[pP*— 2P3(E— P)} 1 OS - YiP+— 2P3(E — P)|e-" 
2 da? —— ’ 20adb ~ . 
1 aS 
we on p> a y) 3B y —. P)\|lp>—2nt 
> 3p (I P3(f P)\e 
: te = — L{Pt — 2P°(E — P)\e-" tb (30) 
2 dadn 7 : . 
1 os f : = t 
_ = —  <[P4 — 29P3(E — P)\e-2"¢ a 2(k — P)et 
> aban [] P3(}: P)\e + ; [P2(} Pye a 
1es ff - \ 
7 — VJIIps — 9p3(pe — P)jp—2nt 4 — [payee — P)\,—ntlp2}2 
59,27 — ie 2P3(E P)\e + } [P2(E — Pye" tb 
1a°S8 l 1 @&S l 
ame aan i. Se. - 
2am BR” 2 apace pu — Pe PI 
1 0°S I 
= ia ae _ 9Dp3(K — Pp 
> 302 RB [P*— 2P3(E — P)| 
: ti = ! sip: — P?(E — P)|te! (30’) 
20Bon FP 
md - — —Y» a. DIRK .. \l4,—nt 
2 aC dn B U oe ha 
1 aS ] 
ped = — VY //P4 — 2P3(F — ?P)|e-2¢ Rik — Pye-nt! #2 
532 7 Bm au: 2P3(} P)\e + BPE Pye" * jt 


With these expressions one may set up the normal equations for 6a, 
5b, dn, or 6B, 6C, dn. So far as the solution of those equations goes 
one may discard a factor 6 or 1/B*, but if one wishes to obtain the 
standard errors of the unknowns, the factors must be kept or ad- 
justment must be made in the formulas for those errors. The equa- 
tions are generally to be simplified by neglecting the terms in E — P 
in the coefficients of the differentials of the unknowns, which corre- 
sponds to reducing to linear expressions before squaring and thus 
using equations (24) extended to three variables. 

32. Manner of Judging Convergence to the Solution. As Schultz*? 
has pointed out, in working by successive approximations one must 
watch what is happening. He examines the size of the terms of next 
higher order in the expansions. This is a very troublesome procedure 











LEAST SQUARES AND LAWS OF POPULATION GROWTH 311 


which takes more time than to repeat the ordinary least squares 
approximation. We have found it sufficient to watch the behavior 
of the variations computed for the unknowns in successive approxi- 
mations, of the expressions which we are trying to make vanish, 
namely, the derivatives of S, and of the sum of the squares S we are 
trying to minimize. The sort of thing that happens may be illustrated 
by the case of one unknown. Let S havea minimum C for x =0; its 
expansion is 
S = C+ gar?+ fba'+ ---, dS/dx = f(x) = ar+ b2?+ --- 


If x be an approximate solution of f(x) = 0 the correction to 2 is 


f(x) ax+ ba?+ .-.. ze (: b | ) 
fa «+t: * ~~ vee 





and 
r+ 6x =a, = be/at+::--, f(a) = ba?+-:-- 


S(ai) = C+ $b%at/a+ --- 


In words, if we regard our departure from the ultimate solution as an 
infinitesimal, the expression which we are trying to make vanish will 
usually be an infinitesimal of the same order, and the expression to be 
minimized will depart from its minimum by an infinitesimal of the 
second order; the correction to the departure will be nearly equal to 
the negative of the departure, differing from that value by an infini- 
tesimal of the second order, the new value of the expression which is 
to vanish will be of the second order, and the new value of the expres- 
sion to be minimized will depart from the minimum by an infinitesi- 
mal of the fourth order. 

Naturally, with three unknowns and three simultaneous equations 
methods are algebraically and arithmetically more complicated. 
Still the general phenomena of convergence remain the same. What 
we should observe as we make. successive applications of the least 
squares method is that the corrections 6a, 6b, dn, (or 6B, 6C, dn) tend 
to become more and more nearly equal to the negatives of the de- 
partures of a, b, c from their correct values, and the derivatives of S 
tend more and more rapidly to vanish, and S tends to become con- 
stant (the departures from the minimum being of the fourth order, 
they cannot well be followed unless a great many arithmetic places 
are carried). The real difficulty as in other cases of successive approx- 











312 WILSON AND PUFFER 


imations is to find a set of values from which to start which are near 
enough right to make the first step converge to better values. In 
practice we have taken trial values of the constant a and plotted log 
(1/P — 1/a) = — nt+ log b to a large scale, read off the values of 
n and b, and computed S for these values. Often it has appeared 
better to make several such trials in an effort to reduce S before 
starting in on the long calculation by least squares. Even at that, we 
have found at times that the least squares solution gave a worse 
value of S than the one from which we started and it has seemed 
better further to explore the values of the constants by graphical 
methods before again proceeding to a least squares calculation.” 

33. Illustration of Convergence for a Malthusian. To illustrate 
these matters we may return to the case of the exponential P = Ce" 
and fit it to the five values 1, 1.7, 3, 6, 13 for t = 0, 1, 2, 3, 4. The 
formulas for the exponential are 











108 1 0S 
A =_- CO — 2 — nt 3 _ —-—- — — = 2 BP cn > ent 
2 0C ey & 2 On ( oes 
_=- = ent M=- = Mi[Ce2nt — (FE — Pye 
: 2 oC” a 2 aC On Ce lial all 
1 0°S 
N=; = L[Ce2nt — (E — P)Ce"'|t? 


and we shall denote by M’ and N’ the values of M and N in which 
E — P is put equal to 0; when near the minimum M and M’ are nearly 
equal since M — M’ = 0 is one of the conditions which must vanish, 
but N — N’ does not vanish at the limit though in most cases it is 
small compared with N or N’. To get a start we fit log P to nt+ ) 
where b = log C (without weights). We find C = .922179 and n = 
.639103 and S = 1.419636. We write the linear equations for least 
squares as 


A+ LiC+ M6N =0, B+ MéC+ Nin = O. 
The values for the next approximation are 
A = — 11.36717, B = —45.663384, L = 229.89432 
M = 718.4296, N = 2469.5106 


767.9464, N’ = 2661.6705 











ae) 
_— 
er) 


LEAST SQUARES AND LAWS OF POPULATION GROWTH 












































“u “) UOIINIOS [BUY dy} SB UdyB]T, | 2E601Z | EZ6EZL° | SZEEIZ’ | §100000° | $F00000° is 
“u “) UOIIN[OS [BUY oy} SB UdyBL, | 2ZG601Z | EZ6ETL° | SZEEIZ° | 9100000° | $F00000° 8 
000000° | 100000 000000° | z2g601z° | sez6ezz° | 9ZzzEETZ° | 1800000°— | #800000° l 
8 auTT 04 000000" | 100000°— | S00000° | 29601Z° | SIz6Ezz° | 9zEEETZ° | 09z1000° | ¢809000° 9 
2 auty © 100000° | +00000° 600000'— | 8S601z° | 69z6Ez2° | Z6TEETZ° | TIS1000° | 6129000 9 
G0G000° | 221000 ZZS000'— | 296012 | 2660FZ2Z° | O9OSZIZ° | 69SFE00'— | 8ZzS1600° iS 
100000° | 000000° 190000'— | 8S60I1z° | zez6Eezz" | 999zETZ° | FEE9E00'— | FE19600° c 
€£60000° | 691000° 682000'— | OLOTIZ’ | 1z60FZL° | LZabSzIZ° | ZOSTSOO’— | Z10StZ0' it 
G aul] 04 ZS0100° | F900" ¢29600°— | 600Z1Z° | 99SG2z2° | ZES9EOL° | ZE89100°— | ZIT9STO’ b 
Z8FEZO | 9OES00'— | 9FZIOO' | GEFSEZ | Z2190ZZ° | IFLSFIZ° | EzzE9Eo' | FEZSSTT’ Rs 
F oul] 0 6866S0° | 6TES00' 98ZSZ0'— | 968022" | $zbz6zL° | STFOS89° | FLFEFFO’ | OOGEZFI’ ¢ 
9 aul] 0} £0000" | 2Z1000°— | £19000" 16601Z° | 8S62EZZ° | I1F6EIZ° | 8968F00° | 0028800" iz 
Z02000° | 980200 ZLSb00'— | 6S9TIZ° | Z6009Z2Z° | 8SSZ80L° | ZOITZ00° | 289800" Z 
Z OUI] 0} O9OFGT” | FZ0OSOO'— | 2Sz800'— | ZI9S0F° | O6688TZ° | I1Z0S0Z° | 0962620° | GLOTLIZ° iI 
€ OUT] 04 6660FT° | 8z96E0'— | OZOZIT’ | 9861S" | 6F6ZES9" | S26E0E8° | BIGTSFO' | STSZT60° I 
“J-§ “uU—U “I-29 s u Oo ug Og 
9F299 2908 | $F9FG LZII 8 
L808 E908 | ZL9F6LZIT | S8I62L°2zF | F0Z000' 6271000" Lg601z | sIz6ezz° | 9zEEsIZ' | 8 
GZLIE G08 | ZLIL6°LZIT | S6808°2ZzF | 090000° 120000°'— | Sg60Tz° =| 69z6EzZ° | zETESTZ' | 2 
IZG08' 6908 | 189Z8° 2211 9 
68099 9908 | SISFZ'SZIT | sog6e zzF | OSOTOS’ LZ6L11 16601Z° | 9962ez2° | II1P6EIZ' | 9 
OSh6Z'6S0E | STIS FIT iS 
1668 9608 | 26996 SFII | 199% 6EF | SZFSOT’ Gs0s90'— | 600ZTZ_—-| YOSEzzz° | zeggeoL | g 
CI9FS £962 | L2ZE98°ZEIT ‘t 
C1620 €16Z | ESOL PITT | 12296 FFF | ZFOZ6F'ZI— | O19290'S— | 9680Lz | EzFzEZZ° | STFOSS9" | F 
FSI8Z SOE | S816z'Z26 F 
I9FS8'240E | TEESE F26 | OZO9S'6IE | SZSTIL'I 179969 ‘I 9961S¢° | 6F6ZFS9" | GZ6eOEs' | F&F 
0F909 €28Z | 69ST $201 iG 
72166 SL2Z | LI6L9 6EOT | 1990Z°ZIF | 9ETO6S EZ— | O8III6'S— | ZI9GOF | O66S8TZ° | TIZO¢OL' | z 
GOL9' 1992 F9F6 292 iI 
90S 69FZ 6ZP SIZ | ZEP68'6zz | FREEIO GF— | OLIZ9E TI— | 9E96IF'I | EOT6E9’ | 621zz6° | 1 
N W 1 d Vv u a 



































314 WILSON AND PUFFER 


It is observable that the equations involving M’, N’ are considerably 
different from those involving M and N. The solutions are different: 


6C = — .0917815, dn = .0451919: 
6C’ = — .2171079 bn’ = .0797960 


With the correct equations, S is reduced to .351956 and with the 
primed equations only to .405617. The new values of the constants 
are 

C = .8303975, a = .6842949; C’ = .7050711, a’ = .718899 


The results of the various approximations by least squares are given 
in the preceeding Table. (As there are sixteen columns in the Table 
it has had to be broken in printing but the lines have been numbered 
so as to guide the reading. ) 

34. Discussion of the Convergence. In this table we have two chief 
lines of descent from the original estimate. One goes to line 3 thence 
to line 4, thence to line 5 where it stops with evidently a pretty good 
fit. In this line of descent we have used the equation in M and N, 
i. e., the equations in which EF — P is not neglected in the expressions 
for the second derivatives, the equations we have called correct or 
complete. In the other line of descent we go from the original esti- 
mate to line 2, thence to line 6 with a recalculation in lines 7, 8, 8’ 
for check. In this line of descent we have used M’, N’, the incomplete 
values. In other words, in this case we got a better solution out of 
two applications of least squares with the incomplete form than out 
of three with the complete form, at least in the sense that C — Ch, 
nN — Mm, S — Sm, Where the subscript m denotes the values at mini- 
mum, are all better at the end of line 2’ than at the end of line 4. 
But it must not be inferred that the simpler equations give invariably 
the better values for if we compare lines 5 and 5’ we find that the 
departures in 5’ are much larger than those in line 5. The method of 
least squares attempts to find the right values of the constants, 
namely, the values determined by that method, those which we accept, 
those from which we measure variations. Until the answer has been 
found to as many places as one determines to find it (in this case six) 
we cannot use departures from the right values as a criterion of near- 
ness to the final solutions. We cannot use the value of S with any 
assurance. Thus in line 1 the value of S is considerably less than that 
in line 1’; yet the solution is not so good. Also if we look at A and B, 
which are to vanish, as tabulated in lines 2 and 3 we find that, like 














LEAST SQUARES AND LAWS OF POPULATION GROWTH 315 


S, they are less for the poorer solution. Thus criteria of excellence 
based (1) on the departures of the unknowns from their values at 
minimum and (2) on the values of the quantities we are trying to 
minimize or make vanish are here inconsistent.“ 

35. The Quadratic Expansion at Minimum. These matters may 
be discussed from the point of view of the quadratic in 6C, 6n at the 
minimum 


S — S,, = LéC? + 2Mé6Cin + Nin? 
= 427.796C? + 2255.96C bn + 3053.3 6n? 


For a given value of S — S,, small enough so that the terms of the 
third order are negligible, the locus in 6C, 6n, is an ellipse with its 
axes at an angle to the axes of C and n and with very dissimilar major 
and minor axes. The ratio of the axes is 18 or 19 to 1. The linear 
equations 427.796C + 1127.956n = 0 and 1127.956C + 3053.36n = 0 
which we have to solve for 6C = 6n = 0 are inclined to one another 
at a very small angle and the major axis of the ellipse lies between 
them. Along one 6C/éin = — 2.64, along the other 6C/én = — 2.71. 
If then 6C/én is in the neighborhood of — 2.7 the values of 6C and 
én for given departure of S from S, may be very much larger, pos- 
sibly 18 times larger, than if the ratio 6C/én is about +.37. The ratio 
C — Cy :n— nm in line 1 of the table is — 3.0 and S — S,, is .14 
whereas in line 1’ the ratio is + 1.6 and S — S, = .19 although the 
values of C — C,, and n — nm» are both decidedly smaller than in line 1, 
and in line 2 the ratio is — 2.2 and S — S,, is only .0007 although 
C—C,, and n — mm are more than half their magnitude in line 1’. 
In the case of the three constant logistic we shall find ordinarily a 
much elongated ellipsoid so that one set of values near to the correct 
ones may have a much larger value of S — S,, than another set 
which is much further away.*® 

36. England and Wales, 1801-1911. After this discussion of method 
we may take some fits beginning with the case of England and Wales.*° 
Yule has given three arithmetic solutions and by feeling around with 
the graphical method we obtained one which had a smaller sum of 
squares. The constants and the quantities which have to vanish are 
given for these four solutions in the table and below them the con- 
stants arising in the first least squares solution from the graphic and 
the constants for a second application of least squares.* 








316 WILSON AND PUFFER 
































a b nm S 
EDs Wivedwwean .010964 .10151 . 16358 .44 
inn Cawcawen .010277 . 10169 . 16006 .35 
rrr .0099946 . 10137 . 15764 .39 
Re .010000 . 10186 . 15888 .327 
Least Sq. 1)....... . 0093274 . 101603 . 155317 . 30792 
Least Sq. 2)....... .00931790 . 1016265 . 1553480 . 30739 

~P?2(H—P) SP2(E—P)e™ | UP2(E—P)te—™ 

| | ee —417 —109 —941 
ar — 100 — 26 —272 
| ree of2 124 1153 
Graphic).......... — 55 — 6 —162 
Least Sq 1)........ + 48 + ll +103 
Least Sq 2)........ — .03 — .008 — .074 














The corrections to a, b, n of the graphic solution found in the first 
least squares were 6a = —.0006726, 6b = —.000257, 6n = —.003563 
whereas for the second least squares solution they were 6a = 
— .00000950, 6b = +.0000310, 6n = +.0000235. There seems to be 
no doubt but the solution found for a, b, n would be improved ex- 
ceedingly little by another application of least squares. Assuming 
that solution can be treated as a true least squares solution for the 
evaluation of the standard errors of the constants we have” 


da = .001065, o = .000894, co, = .00488 


Yule’s best solution judged by the value of S or of its derivatives is 
his second. Our graphic is better by these criteria. His third solu- 
tion, however, has values of a and 7 nearer to the correct ones than 
either his second or our graphic solution. It may be noted that the 
limiting populations forecast for England and Wales by the least 
squares solution is 1/a = 107.3 million. 

37. Germany, 1816-1910. It will not do to think that every case 
will be as plain sailing as England and Wales. Let us consider Ger- 
many. ‘The censuses are not equally spaced in time and one can- 
not apply any of the three methods of Yule without interpolating for 
some of the intercensal populations. We may however begin by try- 
ing the graphic method. After considerable experimenting we found 
that a = —.03, b = .068754, n = .043749, populations being in mil- 
lions and time in decades, gave S = 6.3162. A least squares pro- 








LEAST SQUARES AND LAWS OF POPULATION GROWTH 317 


cedure (simpler equations), led to a very large increase in S, which 
was quite unsatisfactory. Further exploration showed that a = —.05, 
b = .088308, n = .03177567 gave S = 5.8279. By the least squares 
procedure the sum of the squares again increased, becoming 6.1518. 
This was not a large increase and as there is always the possibility 
that the values of the constants actually improve despite an increase 
in S, the least squares procedure was repeatedly applied to the result. 
We found successively 








a b n S |ZP?(E—P)| SP?(E—P)e—™ | SP2(E—P)e—tb 
— .05 .0883 |.0318 | 5.83 —885 — 656 — 558 
— .0554 |.0936 |.0293 | 6.15 4004 3155 2386 
— .05537) .09356) .02951) 5.78 —12.6 —10 —7¢.1 


9.3 4.1 


— .05565} .09384| .02941 > 
oO —1.2 — 9 


5. 
— .05566)} .09385) .02940} 5.78 _ 























This is about as far as one can go carrying seven places; using 10 
places one comes to the result 


a = — .055666575, b = .093856745, n = .029400635, S = 5.7799491, 


with the three semi-derivatives of S equal to something numerically 
less than .001. There is no doubt about the convergence and no doubt 
that the logistic fit to Germany from 1816 to 1910 was hypermalthu- 
sian, showing less than no saturation. 

38. Germany, 1861-1910. The standard errors were however con- 
siderable, being about .046 for a and b and .017 for n. This raises 
sharply the question as to whether the fit is worth anything insofar 
as throwing light on the “law” according to which the population of 
Germany was growing. Fitting the period from 1861 to 1910 to a 
logistic by graphical methods we found™ 


1 
08 — .0537 K 10-%%!¢ 





P= 


whereas 


82.944 
1 + 297.5460~-472! 





P = 33.587 + 


is the fit Pearl gives for this period using the logistic augmented by an 
additive constant. The following table gives the comparative results: 








318 WILSON AND PUFFER 

















Year | E Pearl E-—P Ours E—P 
1861 38.140 38.281 — .141 38 .02 .13 
1871 41.059 40.867 . 192 41.27 — .21 
1880 45.230 44.230 1.000 44.84 .39 
1890 49.430 49.430 .000 49.80 — .37 
1900 | 56. 367 56.370 — .003 56.27 10 
1910 | 64.930 64.930 .000 65.02 — .09 
Sum of the differences............ 1.048 — .06 
Sum of absolute diffs.......... 1.336 1.28 
Sum of squares of diffs............ 1.057 . 366 
1855 | 36.114 | 387.173 —1.059 36.38 — .27 














The fit offered by Pearl does not seem to be anything other than 
arbitrary. The fact that a four constant generalization of the logistic 
fits so much worse than the logistic is sufficient to disqualify the 
result, and incidentally it does not give so good a figure when extra- 
polated back to 1855. Of course our fit cannot be extrapolated for- 
ward very far because of its hypercritical character which carries it to 
infinity in 106 years from 1861. 

39. Connecticut, New Jersey, New York and the Three States, 1790- 
1920. We have pointed out (Art. 17) that Connecticut and New 
Jersey were hypermalthusian in their growth from 1790 to 1920, 
Connecticut so much so as to be almost hypercritical. The values 
we find by least squares for the constants of each of these states and 
of New York, and of the three states together are® (with the addition 
of the sum of squares S and of the standard deviation ¢ of the re- 
siduals). 























a b n S o 
Connecticut.....| —1.638337 6.637899 07972561 | .006720 | .022 
New Jersey......| — .02569654) 8.173653 . 2448360 023421 | .041 
New York...... .03941499) 1.361848 2437437 798846 | .239 
The Three States .01293182| .95388168 | .2209441 639388 | .214 





This illustration shows that the logistics when fitted to parts of a 
region and to the total region may be quite non-additive, when extra- 
polated as forecasts. Connecticut and New Jersey being hypermal- 
thusian will become infinite, New York remains finite, and the three 
states remain finite. The asymptotic population forecast for New 
York is 25.4 million, that for the three states is 77.3 million. If we 








LEAST SQUARES AND LAWS OF POPULATION GROWTH 319 
subtract these we have 51.9 million to divide up between Connecticut 
and New Jersey which, being more than a quarter of the asymptotic 
population forecast for the whole United States by the logistic, should 
be rather too much to be consistent with that forecast, but is unhap- 
pily not enough to satisfy the logistic forecast for Connecticut or New 
Jersey individually! 

40. Reed and Pearl on Connecticut. The table gives the data E for 
Connecticut (in thousands), the figures P obtained by Reed and 
Pearl* to fit this data, and the deviations; then, as P’, the populations 
calculated from the first graphic straight line obtained by plotting E 
directly on semi-log paper and reading off the constants to get P’ = 
10?-09+-080¢ and the deviations E — P’; the next two columns contain 
the sums of the squares of the deviations cumulated from the end; 
and the last column the values of  — P for our least squares logistic 


of Art. 39. 

















Year E r E-—P e E-—P’ \(E—P)*\(E-P’)?| E—P” 
1790 238 116 122 123 115 47810 | 39590 38 
1800 251 142 109 148 103 32926 | 26365 28 
1S10 262 172 90 178 S4 21045 | 15756 13 
1820 275 210 65 214 61 12945 8700 —4 
1830 298 255 43 257 4] 8720 4979 | —16 
1840 310 309 | 309 l 6871 3298 | —45 
1850 o/1 374 — 3 372 — | 6870 3297 | —33 
1860 460 453 7 447 13 6861 3296 | — 3 
1870 537 547 —10 937 0 6812 3127 2 
1880 623 659 —36 646 —23 6712 o127 | — 2 
1890 746 792 — 46 776 — 30 5416 2598 7 
1900 908 948 — 40 933 —25 3300 1698 18 
1910 1115 1131 —16 1122 — 7 1700 1073 18 
1920 1381 1343 +38 1349 +32 1444 1024 | —15 




















It is seen that the two-constant Malthusian fits better than their 
logistic (three-constants) and that judged by the sum of squares it 
fits decidedly better over any period terminating in 1920; indeed as 
the differences E — P’ are practically all less than E — P it would be 
difficult to find any considerable period for which this Malthusian fit 
failed to be better than their logistic.*® 

41. Reed and Pearl on Additivity. Wilson and Luyten illustrated 
the non-additivity of the logistic by citing the simple example 














AND PUFFER 





WILSON 
kh=1, k=3, Es=8; HE, =1, BE, =4, EB’ =9 
Eit+ £) =2, E+ BE =7, Est+ Es’ = 17 
K = 33, K’ = 12.57, K-+ K’ = 45.57 


but with K’’, the asymptotic population for the summed data, equal 
to 30.33 which is not only less than the sum K + K’ but actually less 
than K. In the effort to show the practical additivity of the logistic, 
Reed and Pearl developed formulas for the addition of logistics based 
on fitting logistics through fitting their derivatives (not the logistics 
themselves) by the method of moments throughout the whole range 
of time — « to-++ «. This method forces the sum of the asymptotes 
to equal the asymptote of the sum. It is not a method which would 
seem directly applicable to the data, and, even if it were, it is not a 
method which can meet the point raised by Wilson and Luyten, and 
mentioned several times in the present paper (Arts. 11, 38), that if 
logistics are fitted to the data for two or more regions and for the 
combined regions, the forecasts will not in fact be additive. Wilson 
and Luyten did not say and we would not say that given two or more 
regions, constants for their logistics and for that of the combined 
region could not be so selected as to make a tolerably good fit to the 
regions and their combination both over the time covered by the 
enumerations and even over the forecast*®’ to t = «2. We do not con- 
sider their fit to Connecticut as tolerable; it certainly is not a least 
squares fit, the sum of the squares is 47890 instead of 6720 and the 
constant a has the wrong sign so that the equation is of the wrong 
type. We do not consider that anything from p. 737 to and including 
the first paragraph of page 742 of the paper by Reed and Pear!* is 
tenable, either in theory or in practice, but that, naturally, is for 
others to judge. 

42. The Three States by Relative Residuals 1790-1920. We fitted a 
logistic by relative residuals based on minimizing the sum of 
((E — P)/P} starting from the values of the constants given by 
Wilson and Luyten. We will give three sets for the constants (1) 
those from which we started, (2) those obtained by least squares and 
(3) those obtained by applying Pearl’s method previously discussed 
(Art. 20). 























LEAST SQUARES AND LAWS OF POPULATION GROWTH 321 
a b 17 S 
Wilson and Luyten®’....| .045 1. 264 . 302 .0656 
Least Squares.......... .040163 1.157186 . 2781681 .0412 
Pearl’s Method......... .041861 1. 180323 . 282974 .0435 











The asymptotic population forecast by least squares with relative 
residuals is 24.9 million as contrasted with the figure of 77.3 forecast 
by the method using absolute residuals. The values of the constants 
found by a repeated application of Pearl’s computation form are no 
nearer to those given by the least squares fit with relative residuals 
than might be expected considering the sensitivity of the values to 
even moderately different weights of the residuals in the equations 
which should vanish, but they are so far from the values obtained in 
the solution by absolute residuals as to be an illustration and confirma- 
tion of our analysis of his method as giving a result near to the relative 
rather than to the absolute fit. 

43. New York City and Environs 1790-1920. We have least squares 
solutions for a logistic to New York City and Environs, as defined by 
Pearl and Reed,”° by absolute and by relative residuals and a fit not 
carried quite to a finish for Area III (1850-1920) by absolute re- 
siduals. The values of the constants are: 




















(i b n S 
New York City and Environs 
6 Ee eee .03664116 | 6.207466 | .3414512 | .2326 
New York City and Environs 
ee .00266301 | 4.946915 | .2989057 | .1146* 
Area III, 1850-1920 (Abso- 
ee ee ee — .41059 4.1451** | .14739 .000552 





*'S = X(E — P)?/P? in this case as the relative residuals are minimized. 
** The origin of time is here 1850 instead of 1790 as it is for the others. 


The asymptotic population of Greater New York is 27.3 by absolute 
and 375.5 by relative residuals. Thus the relative residual solution 
makes the city ultimately larger than the three states (Art. 42) in 
which it lies (24.9) and larger than the whole United States.°® The 
absolute residual method makes the suburban area III hypermal- 
thusian though the total area is hypomalthusian by this method. 
We see here other illustrations of the non-additivity of the logistic 
and of the really serious clashes in forecasts which arise therefrom. 














322 WILSON AND PUFFER 


We see also a good illustration of the instability of the method. If 
we use relative residuals we get for greater New York City the very 
large value 375 but for the three states a figure of 25 million which is 
not inconceivable,—indeed they had 15 million in 1920 out of a total 
of 106 for the United States and if the United States is ultimately to 
have about 197, the three states would in proportion have about 25 
million. On the other hand absolute residuals give for Greater New 
York 27 million which appears rather high and for the three states 77, 
which is very high. 

44. A Method Which Might Replace Least Squares. The method of 
least squares is obviously very tedious to apply. The method of using 
three points or Yule’s summation of reciprocals or his other schemes 
are somewhat indefinite in that different groups may be applied, and 
experience with such methods has given us the impression that they 
do not give solutions very near to the least squares solution. We 
have pointed out that this solution can be expressed by the vanishing 
of the sum of the residuals weighted in various ways, and so can 
Pearl’s solution and the method of moments. It seems as though it 
might be well to set up a standard method of fitting the logistic based 
directly on such equations using all the data. Now 


l ] ] l 
p> a+ be, aa ina a+ bet — 5 


l l 
Say a: = = Sap ; +- nt — a 


with any system of weights w which can be calculated directly from 
the data will give an equation linear in a and 6 and algebraic in 
r = ¢~"*, where ¢ is a common divisor of the periods between the 
censuses. Three such equations with different systems of weights 
w, w’, w’’ will enable one to eliminate a and b and have an algebraic 
equation for r. Now the weights are for 


and 


Absolute Residuals P°E, P*k, P-E(K — P)t 
Relative Residuals on P E?/P, E? E*(K — P)t/P 
Relative Residuals on FE P?/E P3/E P?(K — P)t/E 
Pearl’s Method E E? E*(K — P)t/P 
Moments PE, PEt, PE? 


In none of these systems is w calculable from the observations but it 
would become so calculable for the method of moments if we replace 











LEAST SQUARES AND LAWS OF POPULATION GROWTH 323 


P by E, choosing as weights E*, E*t, Et’; similarly if we place P = E 
in the other weights, except for those in the final column where the 
asymptotic population enters. In those cases in which P remains 
much smaller than A, we might replace AK — P by K in the last set of 
weights and reject the K from the equation in which it enters, but this 
is precisely the case in which the asymptote is so far above the 
observed values as to make it futile to determine AK. It would seem 
that one might adopt a standard set of weights corresponding to no 
least squares solution, say £7, £%, E* and make 


S(E— P)=0, S(E— P)E=0, XE-—P)E2=0 (31) 


the last two equations being identical with the two first arising in 
fitting by absolute residuals insofar as P may be put equal to E in 


the weights, and the first expressing the fact that the sum of the 
residuals shall vanish. This should give a solution fairly near to 
that for absolute residuals by least squares. We should correspond- 
ingly take as weights 1, FL, E* for the case of relative residuals.®° 

45. Application of this Method. We have obtained some solutions 
for-r (and hence n) and for a and b based on the equations 

L(at+ be! — 1/E)E?=0, Lat be — 1/E)EX = 0 
Y(at+ be! — 1/E) E* = 0 

leading to 
ZE? ZE«* BE 
Ze Ee" LE | = 0 (32) 
LE Lee LE 


It is interesting to compare them with the least squares solutions 












































a b mn S 

Greater New York....... .031380 5. 7282 . 33056 2368 
Least Squares........... .036641 6.2075 .34145 . 2326 
Connecticut............ —1.654 6.603 07882 .00676 
Least Squares........... —1.638 6.638 07973 00672 
The Three States........ .017597 1.0002 . 23125 .6651 
Least Squares........... .012932 .9538 . 22094 .6394 
Chicago® (1840-1920)... . . 18477 9.5231 . 99599 .01331 
(12 Counties)........... . 18142 9.2043 . 58671 .01305 
Chicago (1860-1920)..... . 16962 2.4576 . 99087 .009435 
(15 Counties)........... . 17049 2.4749 . 95334 .009414 

















324 WILSON AND PUFFER 


The results run a good deal better than those obtained by the sum- 
mation method of Yule or by the use of three points or by the graph- 
ical-exploration method,—as they must do if the extra work of 
applying this method is to be justified. Yet in the case of Chicago 
(12 counties) near as the solution seems to be to the final least squares 
solution, it was far enough off so that the application of the least 
squares procedure led to worse values for S and for the constants”’— 
so great is the sensitivity. 


Ill. THe AUGMENTED LOGISTIC. 


46. The Cycle Theory. Pearl and Reed very early abandoned the 
y ' 
logistic in favor of the formula® 


] B 
at bemnt ial C+ en’ 


or equivalent, wherein the logistic is augmented by an additive con- 
_stant. Whenever one of the constants is additive one condition for 
the least squares fit is that &( — P), the sum of the residuals, shall 
be zero,—provided that by a least squares fit one means the fit by 
absolute residuals. Although Edgeworth’ is very doubtful of the 
utility of the added constant and Yule does not use it and apparently 
does not favor it,’ we consider it worth while to discuss the matter 
somewhat. Although Pearl and Reed talk about a cycle theory in 
connection with the added constant, they do not make it clear that 
this is anything more than a method of interpreting the equation. 
It should be pointed out that the use of this constant introduces a 
discontinuity into the law of population growth. The differential 
equation of the first order is 


dP d | . , 
a ; (P — d) = n(P — d)[l — a(P — d)); 








P=d+ (33) 


(34) 





the excess population over the lower asymptote d grows on a logistic 
and the law must change every time that d changes. It is not clear 
why it should be the excess population which grows in this way, 1. e., 
why the effect of the conditions in the earlier “‘cycle’”’ should persist 
for generations in the new in the form specified. The constant n is 
no longer the natural rate of growth of a sparse population but the rate 
of growth of one scarcely above its lower asymptote.® In the case of 
Germany where Pearl claims two cycles it is noteworthy that the 
lower asymptote of the second cycle is well below the upper asymptote 


























LEAST SQUARES AND LAWS OF POPULATION GROWTH 320 


of the first so that the change takes place without the realization of 
either asymptote—the population jumps from one cycle considerably 
before it is finished into another already well begun; such a change is 
entirely analogous to the simpler change suggested by Knibbs.” 

47. The Conditions for a Minimum; the Imaginary Case. It would 
seem unnecessary to give the long equations which arise from applying 
the methods of the calculus to the minimization of 


s=2(E- Pp =2(#—a— 2 )=2(e-a- I 
| C+ ee! at bem"! 


or any of the equivalent forms.®® The conditions which must be 
satisfied at the minimum are 


L(E—P)=0, Y(E-— P)(P-—d)=0, S(E—- P)(P—-d)?=0 
LX(E — P)(P — djt(i/a -— P+ d) =0 


It has been pointed out that in the simple case of determining the four 
constants to pass the curve through four enumerations equally spaced 
in time, the equations for the constants are quadratic and their solu- 
tion may lead to imaginary numbers.® A solution involving imag- 
inaries we take to be useless, and in any case it could not be found 
by successive approximations based on the usual formulas involving 
real quantities only. The types of augmented logistic include the 
hypomalthusian, Malthusian, hypermalthusian, critical and hyper- 
critical (optionally) previously discussed, each with an added con- 
stant, and further the “imaginary” case. As in the form involving 
B and C, two of the constants, d and B, enter linearly they may be 
eliminated by the usual procedure and the problem of minimization 
may be reduced to one involving two variables only, namely,*’ 


’ hy 2 
#y a eant 


S — L(E = E.)* —_ 1 2 1 2 
kX —|{ 
( C pant ) ( [; + eat ) 


where k& is the number of observations and E,, is the mean value of E. 

48. Convergence in the Imaginary Case. An interesting mathemati- 
cal question is: What if anything does one get by applying the ordin- 
ary least squares procedure to the case in which the constants are 
imaginary? We may investigate this for the simple case of four 
observations supposed to be 








(35) 








(36) 











WILSON AND PUFFER 


2 fk, =3, B=4 B=7, att=0,1,2,3 
Here the imaginary solution is 


r=itiv—8, d=3ivV-8, C= 


B=-—8+}V-8 


We assume a value of r, substitute it and various trial values of C 
into S in the form (36 )to find for C that value which minimizes 
S. The values of B and d, the eliminated constants, may then be 
had from*® 

LSA (E = En) DLA(E — Z.) ‘i LA 


= — d=ha— 
kX A? — (%A)?’ m LSA? — (SA)? 


if A 








B 
1 (37) 

_ fr +- pant 

In this way we obtain the following table (converted to our usual 

variables a, 5). 








( l S 





.45 . 90 .124 
. 665 004 .O81 
.007 . 896 .059 
. 263 .039 .0525 
.824 — .4198 . 05202 

. 999 —84.8868 . 2765 — .46668 .052012 

. 999000 —84.8872 . 2768 — .46697 .052012 
1.001001 +84.8868 —84.4989 — .47846 .052012* 


. 0689 
| — .0250 
| — .181 
9 — 447 
.99 — 8.427 


OO mt i 


QO CO 
Gr on 

















* The next to the last line was obtained from the preceding by least squares. 
In the last line we are over on the hypercritical side using the reciprocal of 
r = .999 which results in a curve identical with that for r = .999 but with 
a different equation.® 


49. The Augmented Critical and the Imaginary Case. What is 
happening is now apparent. A least squares procedure will take us 
very slowly (only with infinite repetition) to the critical solution 
r=1witha=+om% andb’=+ © and the sum of a+b finite and 
equal to about 0.4, with d equal to about .470 and S equal to 0.052012 
and hence with the standard deviation of about 0.115. Although we 
have four disposable constants and only four points the curve will 
not pass through them and the minimum of S is not zero (See footnote 











LEAST SQUARES AND LAWS OF POPULATION GROWTH O27 


47). Indeed the least squares procedure is a very awkward one with 
which to evaluate an indeterminate form. We should fit at once the 
augmented critical form 
] Y | 
P = d+—— = d+ — (38) 
a— Bt e—ft 
and for this we can proceed with (36) and (37) using A = 1/(¢ — ft) 
and replacing B by y, thus arriving at a minimizing problem in the 
single variable ¢ or we may set up the method of least squares for (38). 
In either case the problem is readily solved, an approximate result 
being 
11.76 


4.577 —t 





P= — 4703+ (39) 
to which we could apply the standard procedure with rapid instead of 
infinitely slow convergence. We thus may obtain a least squares 
solution but is it of any mathematical significance? Theoretically 
our problem is to solve the problem of determining d, a, b, nm and 
mathematically they can be perfectly determined, but imaginary. 
The procedure gives not those imaginary values but real values with 
d = — .4703, n = 0, and a and 5b infinite with opposite signs. The 
result is certainly no approximation to the correct imaginary solution 
though the graphic (39) has been found (with very different values of 
the constants) which fits tolerably well! 

We have used the term “imaginary” case when the constants turn 
out imaginary because the solution cannot then be reached by the 
ordinary least squares procedures based on reals; but the augmented 
logistic curve defined by the imaginary constants is perfectly real, 
namely, 


P=3+ V2 tan [z(t — 1) tan V8) 


And in general since r = e~"* must be a root of an equation of the 
form 7?-+ ar-+ 1 = 0, the two values of r must be of the form e*‘” so 
that 7 is pure imaginary and the augmented logistic takes the general 
form 


a+ 6B tan dn(t — y) (38’) 


which may be regarded as a sort of hypercritical case since the critical 
case (38) is in a way intermediate between (38’) and the ordinary 
form (33). This hypercritical form has neither upper nor lower 
asymptote, the population being positively or negatively infinite in 
finite time, namely, when t = y + </n. 








328 WILSON AND PUFFER 


50. Germany, 1816-1910. Interesting as such results are, they are 
annoying when the conditions represented by them arise in practice. 
If we undertake to fit to the enumerations of Germany a single 
augmented logistic, obtaining trial values of the constants by picking 
out four points, we obtain imaginary solutions with such large pure 
imaginary parts that it seems impossible that they arose from fluctua- 
tions in the stated enumerations. The graphic approximations ob- 
tained in exploring to reduce S indicated very small values of n (less 
than .01) and the application of the least squares procedure moved 
the constant toward zero very slowly, reducing S very little. It was 
obvious that we had truly an imaginary case and should fit the 
limiting form (38). The least squares result found was: 
999.36 


P = — 76785-+ ~— —, 8 = 5.7045 
6.616 —™ 6 







































This is not a biologically satisfactory solution; the lower asymptote 
is negative, and the population becomes infinite in finite time but it 
is undoubtedly the mathematical solution of the problem of fitting 
the German censuses 1816-1910 to an augmented logistic by least 
squares procedures. We may regard the answer as a limiting form 
of the augmented logistic and it is possible, though actually of infini- 
tesimal probability, that such a form should arise in fitting a set of 
points which lay near an augmented logistic with real constants 
(a and b large and of opposite sign, n small but not zero, d arbitrary), 
but the curve seems sure to arise when the augmented logistic fitted 
to four points is imaginary and it is extremely doubtful that it should 
then be regarded as an appropriate solution.*® It may be noted in 
passing that the value of S = 5.70 based on a four-constant fit is in a 
sense not so good as the value S = 5.78, which we had (Art. 37), 
based on a three-constant fit because the “‘errors to be feared”’ in a 
determination whose weight is unity” have to be compared on the 
basis of S divided by the number of observations diminished by the 
number of constants. We have 5.70 to divide by 11 — 4 = 7 whereas 
5.78 need be divided only by 11 — 3 = 8. 

Although a least squares procedure operating on the ordinary 
logistic form would lead (infinitely slowly) to the critical form dis- 
cussed above, the application of a least squares procedure based upon 
the form (38’) leads directly to a solution for Germany, 1816-1910, 
near 


P = 32.285 + 18.497 tan (0.86961 ¢ + 158.837), S = 1.26 




























LEAST SQUARES AND LAWS OF POPULATION GROWTH 329 


with a value of S much smaller than that (5.70) found for the critical 
case. This was to be expected in view of the large pure imaginary 
parts of the values of r found from four equally spaced points. Al- 
though the good fit shows the high degree of fittability of the aug- 
mented logistic, the result is not satisfactory biologically as the pop- 
ulation would have been zero in 1771 and would become infinite in 
1944. As a mere matter of curve fitting, however, it may be pointed 
out that the fit to the single four-constant augmented (imaginary or 
hypercritical) logistic giving S = 1.26 is closer than that given by 
Pearl’s two logistics® involving eight constants with S = 1.44. 

51. Germany, 1816-1855 and 1861-1910. Now Pearl*' ® broke up the 
series of eleven enumerations for Germany into a first series of five to 
which he fitted one four-constant augmented logistic and a subsequent 
series of six to which he fitted another. This is using a great many con- 
stants (eight) to fit eleven points besides leaving a certain amount of 
additional freedom in the choice of the place where one would place the 
break between the two series. The graph of the German population on 
semi-log paper or the ‘“‘divided differences’ between the observa- 
tions show decided irregularities in the rate of growth. The arcs for 
1840-1880 and 1880-1910 are concave up indicating that the logistic 
fit to these periods individually would show no saturation. The aug- 
mented logistic fitted to the four censuses 1880-1910 gives d = 41.48, 
a = .02487, 6b = .2383, n = .£8670. The lower asymptote is impos- 
sibly high for extrapolation back even to the preceding census nine 
years earlier, the upper asymptote is 81.64, the value of n is extremely 
high but the demographic interpretation of n offers serious difficulties 
in the case of the augmented logistic. To fit augmented logistics to 
other series of four points would require interpolation of the enum- 
erated populations and the irregularities of the differences would make 
the values obtained for the populations depend somewhat on the 
method used to estimate them. But if we take 37.81 for 1860 and 
40.68 for 1870 and use 1880 and 1890 we find d = 35.88, a = .05770, 
b = .4611, nm = 1.1181; if now we try to take this forward we find an 
asymptote of 53.21, a population which is exceeded by 1900, the very 
next census after the last we used. These two sets of values figured 
from two sets of four points show the instability of the solution. If 
we fit to the middle four censuses the results are imaginary! The 
stability of the series of six points for fitting by the augmented logistic 
is small. How Pearl obtained his “‘fit’’ to the period 1861-1910 we 
do not know; we have shown (Art. 38) that with the simple logistic 





300 WILSON AND PUFFER 


we could make the sum of squares less than his and certainly with the 
four-constant augmented logistic one could make it still less.® 

52. England and Wales, 1801-1911. The augmented logistic fitted 
by absolute residuals to England and Wales gives d = — 4.682588, 
a = .002579344, b = .07124426, n = .10722407 with S = .23982. 
Note that the lower asymptote comes out negative which is a corrobo- 
ration of Yule’s judgment? in not trying the augmented logistic and 
in contrast to Pearl’s*' positive value of d = 2.373. The value of the 
limiting population is 383.0 million instead of Pearl’s value of 73.0. 
The value of S for Pearl’s solution is .5998, well over two and a half 
times ours, and not so good as our three-constant fit nor any of Yule’s 
three solutions. The standard errors are”® 


cad = 4.7, cao = 0065, Cb = O19, on = .039 


These show that the values of d and a, statistically speaking, are 
without significance. Now the population curve for England and 
Wales plotted on semi-log paper shows a distinct curvature which 
may be due to the saturation found in the logistic or to the presence 
of a value of d or to both. We have given the logistic fit (Art. 36) 
and we now give the fit to an augmented Malthusian (graphical, not 
| east squares) 


P=d+ Ce", d= — 6.520, C = 15.35, n = .093035 


which makes S = .2435. This value of S is far nearer to the mini- 
mum for the four-constant augmented logistic than is Pearl’s, and 
is lower than our three-constant least squares logistic. We have also 
fitted graphically an augmented critical case (n = 0) of the type (38) 
with 

d = 33.507, vy =.1178.4, e = 27.89, S = .2997 


The value of S is lower than for Pearl’s four-constant fit or any of 
Yule’s three solutions, or our logistic. 

Now if we are to fit any type of curve by any assigned method to 
any set of points, we solve our problem of fitting correctly only when 
we do in fact fit that type of curve by that method to those points. 
Our solution may be without physical significance, it may be without 
statistical significance, but it has mathematical validity. The problem 
of finding a “‘satisfactory”’ solution, namely, one which suits us, all 
things considered, is a different problem, one indeterminate mathe- 
matically, one determinable only personally. The results we have 
obtained for England and Wales will bear study for the light they 





LEAST SQUARES AND LAWS OF POPULATION GROWTH dol 


may throw on the various laws of population growth under discussion. 
We can obtain most excellent fits judged by the smallness of S, the 
sum of the squares of the absolute residuals, either by a simple logistic 
(S = .3079), or by an augmented Malthusian (S S .2435), or by the 
augmented critical (or limiting) type of the augmented logistic 
(S < .2997), or by the standard augmented logistic (S = .2398), and 
there is Pearl’s solution (S = .5998). Taking these five we have the 
following comparative table of the values of the constants: 























S d a b | n d+l/a 
. 2398 — 4.68 0026 .071 .107 383 
. 2435 — 6.52 0 .065 .093 PD 
. 2997 —33.51 +n * F 2 * 0* co * 
.3097 0 .0093 102 . 155 107 
.5998 2.37 .0141 . 135 .195 73 








*bn = 1/y = .00085, a + b = e/y = .024; population infinite in finite 
time. 


With these facts he would be a rash prophet who would assign a 
limiting value to the population on the basis of using the augmented 
logistic. Arithmetically the trouble is that in the usual formulas for 
the standard errors, the denominator which is the determinant of the 
normal equations nearly vanishes whereas the first minors which occur 
in the numerator do not so nearly vanish. The standard errors are 
large. This is itself a warning, but it would give no adequate intima- 
tion, we think, of the very great range of which the values of the un- 
knowns are capable without largely increasing S and thus no adequate 
picture of the real instability of the problem of determining those 
unknowns or the unreliability of any particular determination of 
them. If the problem is to select from these five solutions that which 
is the best forecast we should probably select Pearl’s because it gives 
the lowest asymptote,”! but we should arrive at a forecast in a totally 
different way which would lead to a considerably smaller value of the 
asymptote.” 

53. Development About the Minimum. As we have spoken of the 
instability of the whole problem of fitting logistics, particularly aug- 
mented logistics, and as the fact that the unknowns enter non- 
linearly makes the locus S = const. non-quadratic, it is worth while 
to illustrate these matters by an example. Consider the locus S = .6 
in the ease of England and Wales for which the minimum value of 











WILSON AND PUFFER 


of — _— 














——— 
~ to 








ov 


























{UL 
aN 
= 





us : 
i 
——— =) 
| 
it 
=" 
_ 
4 








| 
8 





| | 
| Y A 
a Fa 
O << 0 
































2 = Yi 2 
30 B2 BA Bb 88 30 9e 94 56 I 1.00 

Fic. 2. Plot of the limits of C and r for S = .6. The scale of C on both 
margins is from 0.2 to —1.0 and the scale for r is from 0.8 to 1.0. That part 
of the figure for which C > 0 is redrawn in the upper lefthand corner with the 
scale for C enlarged four times as indicated by the figures .10, .05, 0. For any 
values of C and r within the curve H E A E’ H the value of S will be less than 
0.6 if B and d are properly chosen (37). The values for the lower and upper 
asymptotes for the respective points on the contour are: 








H G F E D c B 
Lower —27 — 23 —9.4 —3.3 .60 2.5 2.1 
Upper —27 —33 —95 — 4000 140 85 71 











LEAST SQUARES AND LAWS OF POPULATION GROWTH 330 


S is .2398. There will be a surface or spread of three dimensions in 
the four-dimensional space of d, a, b, n or d, B, C, r or other four un- 
knowns which we choose for representing the analytical form of the 
augmented logistic. The minimum point will lie inside this locus; 
so will all the points listed in the table next above. In the space of 
d, a, b, n the locus S = .6 will go to infinity for a and for b. As we 
cannot well represent the three dimensional locus in four-dimensional 
space, we may make a plot on the plane of r and C of the projection 
of the spread S = .6 upon that plane and we may give the values of 
d and K = d+ 1/a around the plot (see legend to fig. 2). One way 
to make the plot is to use (36) to find the values of C which for a 
given value of r make S = .6 and compute the values of d and B 
from (37) and the value of K from d+ B/C. The figure shows the 
results. The value of r may vary between .8145 to 1.0, C may vary 
from 0.12 to — 1.0 (which represents augmented critical cases); K 
may vary from 69.2 to infinity (for C = 0) and run down from — « 
to small negative values (— 41.8 or — 27.2). The values of d range 
from about + 3.0 to about — 42 or — 27 on the two sides of the 
locus. The locus itself is nothing like quadratic, but is rather crescent 
shaped. Of course in the immediate vicinity of the minimum such 
as S = .2400 the locus would be essentially quadratic. The standard 
error of d is 4.7 so that d = — 4.7 + 4.7 but the actual range of d 
for S = .6 is from + 3 to — 42 which is far from symmetric on — 4.7 
as a center; and as K runs from 69.2 to -+ « and from — « down to 
— 27 the value K = 383 plus or minus whatever the standard devia- 
tion of K might be would give no picture of the range of K when 
S = .6. Owing to the double way in which the same augmented 
logistic may be represented we could make a continuation of the plot 
for r > 1, but with no advantage. 

+ 54. Connecticut and the Three States 1790-1920. The values we find 
for the constants in fitting the augmented logistic by least squares are: 





A B’ C’ D’ E’ F’ G’ H 
Lower 3.0 2.8 a4 «46 <4se ~<mse ~< -—H 
Upper 70 69 74 96 220 —-260 —5l1 —42 


The part of the horizontal line C = 0 intercepted by the curve is the locus 
K = © corresponding to augmented Malthusian solutions. The part of the 
curve above this line corresponds to hypermalthusian solutions. The points 
I, J, K, M within the curve correspond respectively to Pearl’s solution, the 
least squares logistic, the least squares augmented logistic, and the graphical 
augmented Malthusian. 








334 WILSON AND PUFFER 























d a b n S 
Connecticut....... .1983387| .1374734 |29.48463 |.2865332/.001793 
Standard Error... . .016 .110 8.12 .0310 
Three States. .....|—2.202367 | — .04003093) .3821285). 1047145) .363807 
Standard Error....| 1.82 .0561 .140 .0660 
(Relative)......... — 1.373437 |—.0113281 .4832953) . 1456240) .008721 * 





* S = X(K — P)?/E* as the relative residuals are minimized to the base EZ, 
not P. 


These solutions have not been carried through to completion due to 
the large amount of work necessary to get a graphic solution near 
enough to the minimum to obtain satisfactory results by the applica- 
tion of least squares procedure. It is noticeable that Connecticut 
which was largely hypermalthusian with the simple logistic has 
shown saturation with the augmented logistic. The sum of squares 
has been reduced from .0067 to .0018. The limiting population is 
7.33 million. The standard errors are however, rather large, but d, 
judged by its standard error, is well established as positive. The 
Three States (Conn., N. Y., N. J.) taken together have a negative 
lower asymptote and have become hypermalthusian, the population 
becoming infinite in finite time. Thus neither d nor a make sense. 
The value of S has been considerably reduced by the use of the addi- 
tional constant. Yet the constants, judged by their standard errors 
are not well established. The relative fit also shows good improve- 
ment in S but is hypermalthusian with negative d like the absolute fit. 
In fitting the simple logistics we found that both the absolute and the 
relative fits gave finite asymptotes. It may be mentioned that we 
have found two graphical fits which were better than the logistic 
(judged by the value of S), one the augmented Malthusian, the other 
the augmented critical’*—as in the case of England and Wales. 

55. New York City and Environs, 1790-1920. New York City 
(Areas I, II, III and the total area) were discussed by Pearl and Reed”° 
for the Plan of New York and Its Environs in 1923; in this paper 
(p. 21) they said they fitted augmented logistics by least squares to 
the population counts of the four areas. The populations of the 
separate areas I, II, III were available only from 1850 to 1920, but 
that of the total area was used from 1790 to 1920. Our augmented 
logistics with the value of S and of the asymptotic population d+ 1/a 
are given below with theirs for comparison—origin of time the initial 
time, whether 1850 or 1790. 





LEAST SQUARES AND LAWS OF POPULATION GROWTH 


330 





























d a b n S d+1/a 

Ee ia oy .448368 |.09519626) 2.082730) .4980710} . 1244 11.0 
sf 8} eee ‘small’’73) 050279 .3849348 |. 1567 19.9 
re . 1057562) .2782966 |17.97468 |.5115934)| .002812 3.7 
i. ower .087 .093500 .428817 |.003766 | 10.8 
Area Eii.......... . 1358433). 2650746 | 7.011994! .2804549| .0004625) 3.9 
fo. rere .150 . 24518 .290181 |.0005574| 4.2 
Total Area....... .0787729) .0428006 | 7.618335) .2626669) .2245 23.4 
Stand. Error...... o = .127| c= .0138) c=2.96 |o= .0423 

 ) eee “small” |.028653 .32300 |.2500 34.9 
(1850-1920)... ...].712831 |.06396383) 1.533368) .4733573]) . 1761 16.3 
Relative 

(1790—-1920)...... . 1480515) .06229789)11.22922 |.412555 |.02924 16.2 





In regard to these results we may note the great difference we find if 
we compute the curve for the total area on the basis of 1850-1920 
which is that on which the others must be computed, and on the 
basis of the total time 1790-1920; all the constants are markedly 
different and the terminal population is 16.3 instead of 23.4. One 
may also note the difference between the limiting population 23.4 
we find and the figure 34.9 found by Pearl and Reed. In view of the 
statistical indetermination which we have found in other problems 
we do not claim that 34.9 departs from 23.4 by more than the standard 
error, but it is nearly 50% greater and that might be regarded as a 
substantial difference in city planning. We note further that all our 
limiting populations are less than theirs, that of Area I being only 55% 
as much and that of Area II only 34% as much.“ Our indicated 
limiting populations in the three areas add up to 18.6 whereas that of the 
total area based on the same period of time is 16.3 and on the whole 
time is 23.4. If we compare these results with those obtained by the 
logistic we see that with d = 0 the limiting population of the whole 
area is 27.3 (by absolute residuals) whereas with d disposable it is 
23.4, not a great difference; on the other hand with relative residuals 
we have 375 with d = 0 but 16.2 with d disposable which shows a 
great difference and furthermore the variations are in the opposite 
direction. On Area III the straight logistic gave (by absolute re- 
siduals) a hypermalthusian growth whereas with d disposable it gave 
a hypomalthusian growth with an asymptote at 3.9. This shows how 
sensitive the value of the forecast may be to the method of fitting 
and how necessary it is to specify the method. The great differences 
between our values and those of Pearl and Reed show how necessary 





330 WILSON AND PUFFER 


it is to carry the work through correctly. We are not addicted to 
logistics or to augmented logistics as a method of forecasting popula- 
tions but we believe that the values we have found for the three 
areas and the total area by applying the method of least squares 
with absolute residuals are not only more correct arithmetically 
than those of Pearl and Reed but more reasonable as forecasts. We 
doubt if Area I will ever have 19.9 million persons, we doubt if Area 
II will ever have 10.8 million; we believe that 11 million and 3.7 
million, respectively, are much more reasonable figures,—but in view 
of the various results which we have found for different populations 
throughout the course of this paper we attribute this reasonableness 
to accident rather than to the infallibility of the method as an ex- 
pression of natural law. 

56. Chicago, 1840-1920. Monk and Jeter® worked up Chicago for 
12 counties (1840-1920) and for 15 counties (1860-1920) and came to 
the conclusion that there was too much instability to put much 
dependence on the use of the logistic or augmented logistic. We give 
a table of the constants obtained by least squares for the augmented 
logistics and the simple logistic 





d a n S d+1/a 





12 Counties......... — .074429) . 1653623/6.811656) . 5348088) .010229 
Stand. Error .0676 |.0184 1.88 .0507 
15 Counties......... .0547 |.17901 |2.8009 |.58257 /|.009114 
12 Counties......... 0 .1814158/9. 204304] . 5867093) .013054 
Stand. Error. . .0110 1.09 .0275 
15 Counties......... 0 .1704878)2 . 474853) . 5533423) .009414 























Whether there be or be not stability the results for the limiting popu- 
lations by the logistic with and without d are reasonably consistent, 
more so than the results for New York City, and the standard 
errors are no greater fractions of a and m than in the case of New 
York. It is perhaps a slight inconsistency that the limiting population 
of 12 counties (with d) should be less than that of 15 counties which 
includes them, but this lack of additivity seems no greater than that 
found in other cases.” 

57. The Weights of the Observations. The weights of the observations 
are generally determined a priori from a knowledge of the reliability 
of those observations individually. In the case of fitting a growth 
curve this would mean not merely the precision of determination of 
the individual observations but their inherent variability due to 











LEAST SQUARES AND LAWS OF POPULATION GROWTH BBYi 


irregularities of growth. This is in itself a difficult statistical problem 
and we know of no way to settle it for population growth; it perhaps 
has not yet been adequately discussed for growth (e. g., in body 
weight) of individuals. We may, however, now that there are avail- 
able a considerable number of populations fitted by least squares to 
logistics and to augmented logistics, give a brief statistical discussion 
of the a posteriori evidence as to the variability of the departures of the 
enumerated figures from the calculated figures. We may discuss 
this question: Having fitted the curves by absolute residuals and thus 
without weights, do the variations of the census figures from the fitted 
curve show any significant trend in time, i. e., with increase of popu- 
lation? One simple way to analyze the material is to consider the 
sum of the absolute errors for the first half of the censuses (when the 
populations were smaller) and the sum of the absolute errors for the 
second half of the censuses (when the populations were larger). We 
tabulate below, for the augmented logistic and for the logistic the 
percentage of the total arithmetic deviations which was constituted 
by the sub-total of these deviations occurring in the first half of the 
censuses (omitting entirely the middle census in case their number 


is odd). 


PERCENTAGE OF ToTAL DEVIATION FALLING IN First HALF 

















Augmented Simple 
Logistic Logistic 
ar ee 24.5 18.4 
Connecticut, 1790-1920... . 1... 2. cc eee 53.3 73.1 
Now Seweey, 17GO-1G00........ i cc ccc. — 58.7 
New York State, 1790-1920. ................. -— 50.3 
The Three States, 1790-1920................. 16.7 40.1 
England and Wales, 1801-1911................ 39.6 59.5 
Ee ee ere — 64.7 
Chicago (15 counties), 1860—-1920.............. 43.9 §1.2 
Chicago (12 counties), 1840—-1920.............. 40.3 48.4 
New York City Area I, 1850—-1920............. 43.9 --- 
New York City Area IT, 1850—-1920............ 40.5 — 
New York City Area ITI, 1850-1920........... 32.1 28.3 
New York City and Environs 1790—1920....... 19.4 25.6 
New York City and Environs 1850—-1920....... 43.9 — 
i kd ew hka eww eens 53.7 — 
Mean of the percentages..................... 37.6 47.1 











338 WILSON AND PUFFER 


In the 12 cases for the augmented logistic only 2 have percentages 
greater than 50 whereas 10 have them less than 50 and the mean is 
37.6; in the 11 cases for the simple logistic 6 have percentages greater 
than 50 and 5 less than 50 with the general mean 47.1; furthermore, 
in almost every case (6 out of 8) in which we have both solutions the 
percentage of error in the first half is less for the augmented logistic 
than for the logistic. From these facts we make the tentative in- 
ference that in fitting the simple logistic by absolute residuals we 
find a posteriori that there is no evidence that the variations in the 
smaller populations of earlier times were numerically less than the 
variations in the larger populations of later times and hence that there 
is no evidence that we should not fit by absolute differences, but that 
in fitting augmented logistics by absolute residuals there is consider- 
able evidence that there is a trend in the residuals with time (or with 
the population) and that a given variation is more significant if on a 
smaller population indicating that probably we should not fit by 
absolute residuals but use weights which decrease with increasing 
population.’®© Considerable further investigation would, however, 
be necessary to establish such inferences with reasonable assurance. 

58. Forecasts and Fulfilment. This study would be incomplete if 
reference were omitted to a comparison of the enumerations of the 
census of 1930 with the forecasts from the fits based upon periods 
terminating in 1920. We have already referred to the results for the 
whole United States.” It would perhaps be unfair to use the 1921, 
1931 figures for England and Wales for comparison with fits based 
on 1801-1911 because of the gross disturbance due to the war; suffice 
it to say that all the forecasts were naturally far too high. For the 
same reason, and in greater degree, we should not make comparisons 
in the case of Germany. For Greater New York we have not been 
able to identify any of the regions but Area I. The table gives the 
comparisons where we have been able to make them. 

In respect to these forecasts and their fulfilment it may be noted 
that those for Chicago were worthless as Monk and Jeter surmised. 
The best forecast for Connecticut is that of Reed and Pearl on a curve 
type that is wrong for a least squares fit. The same is true for New 
Jersey. The least squares fit and the fit by Reed and Pearl for New 
York State are about equally good as might be expected from the 
nearness of the two solutions. For the Three States the least squares 
fit gives a better forecast than the Reed-Pearl fit, and the augmented 
logistic is better than the logistic. However, the relative augmented 





LEAST SQUARES AND LAWS OF POPULATION GROWTH 


339 








E L |E-L| A |E-A| P |E-P o 

Connecticut...| 1.607] 1.866) —.259] 1.688)—.081) 1.587} .020).020/.011 
New Jersey...| 4.041) 4.173) —.132 3.931; .110 .041 
New York... .|12.588]11.862| .726 11.907; .681 . 239 
3 States...... 18.236)17.796) .440)18.553) — .317|17.363) .873).214/.161 
(Relative). ... 15.694; 16% |18.009) 1.2% 9.4/2.5 

Area I (NYC) |} 8.063 7.914; .149| 8.441)—.378 .125 
Chicago (12)..| 4.979] 4.381] .598) 4.457) .522 .038/.034 
Chicago (15)..| 5.058) 4.506) .552) 4.471) .587 .037/ .036 
































E = Census enumeration, L = least squares logistic (Arts. 39, 42, 54, 55, 
56), A = least squares augmented logistic, P = Pearl and Reed figures. 
When two figures for o are given the first is for the logistic, the second for the 
augmented logistic. 


logistic is better than the absolute. For Area I of New York City the 
least squares augmented logistic gives a better fit than the figures of 
Pearl and Reed. From these results we can only infer that the least 
squares logistics or augmented logistics fitted by absolute residuals 
are not very satisfactory as forecasters, probably not so satisfactory 
on the whole as the Pearl-Reed fits, which though claimed to be by 
least squares, are not least squares fits.” Itis worth noticing that the 
errors in the forecast of the curves fitted by least squares are on the 
whole much larger than the average errors ¢ of fitting although we 
are not so far removed from the portion of the curve actually fitted 
as to make a theoretical increase of importance in the deviation. 
Indeed for the least squares fits the ratio of error to ¢ are — 13, — 7, 


— 3,4+3,+ 2, —2,+ 3, + 0.5, + 1.2, + 18, + 14, + 15, + 16. 


IV. SUMMARY AND CONCLUSIONS. 


59. Summary. We have tried by topical headings and frequent 
summary to make clear the general nature of the results of this in- 
vestigation throughout the course of this paper, but a general brief 
summary may be added: (I) There does not appear to be any law of 
population growth which will rigorously satisfy the law of superposi- 
tion or additivity, the logistic (simple or augmented) may satisfy the 
law reasonably well or not at all. The logistic has a plurality of types 
and as fitted empirically to the data cannot be affirmed to contain the 
law of saturation of population. (II) The determination of the con- 
stants in the logistic, simple or augmented, by various simple methods 
indicates a high degree of unreliability in the determination. The 








340 WILSON AND PUFFER 


method of least squares though yielding definite results is extremely 
laborious in its execution after a proper trial solution has been ob- 
tained from which to start the successive approximations and often 
involves a long process of trial before the proper trial solution can be 
found. (III) The approximation equations for the least squares pro- 
cedure given in standard treatises will in some exceptional cases fail 
to give the convergence at all and must be amended, though for most 
cases they are satisfactory. The constants in the form of the logistic 
should be taken so as to avoid imaginary values or we may be forced 
very slowly to limiting forms, i. e., the solution may depend on the 
analytic form of the formula. For the augmented logistic conditions 
may arise where we are forced to such limiting forms no matter how 
we take the constants. (IV) The method of least squares though 
affording definite results does not fail to indicate, through the standard 
errors of the unknowns, a considerable degree of instability in the 
values of the constants, yet the actual variability in them for fits 
which make the sum of the squares of the residuals not very much 
larger than its minimum may be out of all proportion greater than 
indicated by the standard errors. (V) A considerable number of 
actual fittings by least squares have been made so that for the first 
time one may see how such solutions behave. 

60. Tabular Summary. In the table are given in summarized 
form the values of the essential constants which we might desire to 
interpret, as they have been determined. Where the entry under d 
is blank the fit has been by a logistic. All fits are by absolute residuals 
unless marked “‘relative.’”” When the entry under “limit” is “infi- 
nite” it means “‘infinite in finite time.’’ All figures for d and for the 
“limit” are in millions. 

We regret that we cannot include in the table a greater amount of 
data, but the calculations are heavy. Generalization from -so few 
cases is hazardous but some observations may be risked. The value 
of d is negative in 6 out of 16 cases. For five of these cases in which 
we have standard errors, and presumably for the sixth, the value of 
d is not significant. However, in the five cases for which we have 
fitted also a simple logistic, there are three in which the upper limits 
of the population are totally different according as we keep the in- 
significant negative d or set d rigorously equal to zero. We regret 
that we cannot give a similar discussion for d positive because of 
insufficient time to calculate standard errors. The logistic becomes 
infinite in four out of thirteen cases, as it is for Canada.'® The aug- 


\ oe | 
a 
vs 


LEAST SQUARES AND LAWS OF POPULATION GROWTH 





‘pasn s}UBISUOD Jo JequINU 94} 1OJ paysnfpv yeyy ,o “Y4/G A UOIVBIAVp prvpuvys oy} ST © { 
‘quowysn{pe sarenbs 4svo] [Buy & pey you sey AUBULIES) 10J UOTYNOS AreUIZEUII STUT, , 











cco’ | 980° | 9S ERG" 621° ae eer eee rd 0Z6I-O98T ‘SsetzuNOD CT ‘OSBaIYD 
660° | 280° 6S ECg OLT' . tt eee 0261-098 ‘setuNod GT ‘osvaIYD 
cho’ | F£0° | O'9 1S0' FPE¢’ 610 ¥-S9Il° |s90' FPO; —-| CCC OZ6I-OFST ‘SetyuNOD ZT] ‘osRaIYD 
L¥0° | 8&0" G¢‘s 120° F289" 110° + I81° ee whereas. 0Z6I-OFSI ‘SetyUNOD ZT] ‘OBvaIyD 
TES" | Shr ZOZ 910 80° | 0S000° FS6F00' a ee eee S8}VYG popu) 
60¢S° | OFF’ 961 | £900 FFIE’ | 22000 FOTS00' Se Rae ene, oe OI6I-O6Z1 ‘899819 popu 
or: | ect: | Olt OBE +r POs (eee crys joaray} | vary 
120° | 610° 1s z1c: tt i Sree rer ree yooray} [] vary 
S010 |9200° 6°¢ O8z ’ C9Z ee? ee ng eee ee jooloy} [J] Bory 
coro’ lesoo’ | enugut IPL 1e = a. Se re ee joou9y} [I] Bary 
OIZ° | SPI’ e'9T ELE° 0F90' GIL’ * " 0Z6I-OG8T ‘suoltAUG puB APD YIOX MON 
MPS 1%9OF | FOL Se a £790 Ss 2 "* * **(aaypjagy) SUOITAU puv API) YOK MAN 
MZ Ol |%0'6 GLE 662° 2200° — "*** (aaypjay) SUOMAUT puw AYID) YAO MON 
OST’ | 221° b'&2 ZhO FE9e° | SE10' FSZEO’ ee oe ee SUOIAUY puv AZ) YIOX MON 
Ctl’ | 621° ‘a IFS 990° — * “QZ61-O6LZ1 “SuOItAUD puB AID YIOK MON 
%0'S 1249's | eyuygut OFT’ €110'°— ee ee ee (aarnjay ) S8YBIG doIYT, OY], 
%1'9 \%b'S | 6 FS 81° 100° — rrr (a02)DJaY ) S97YBIG BIT, oy J, 
I6T° | 19T° | eWUygUT | 990° FOI’ 960 *0F0' — ae eee Sd}VIG VY YT, YL, 
1tZ° | FIZ° | SLL 122 6Z10° a ke ee OZ61-O6BLT ‘S07BIG BOTY, OUT, 
OLZ2° | 68% | + ¥F°SZ £20 F FFs Z10' F F6E0' eRe eee eee OZ6I-O06LT “Y4OR MON 
9F0° | IFO | oFUYUI ee 920° — an ees Babs: OZ61-06L1 ‘Aosi0¢ MON 
e1o’ | 110° | e872 120° 282" 11 FOP OO'HOG [ccc ete eeeoneee tees qnorjoauU0’) 
GZO° | 220° | eytuyuI 080° 19° I- a Bee tee 0Z61-06LT “noyoouUU0D 
Izbe | &bs° | eytuygur LOT’ 9$Z0 ° — ae: eo eee OI16I-T981 ‘Auvuliary 
ttZ° «| «GOT OF CF’ 6680 SI einai. CGSI-9IST ‘Auvuriary 
FPS’ | OZL° | atUgUI 0 (2°L1=* ‘“66G=9)| OGFLLZ—-| ((BoIQUD) AUBULIAr) 
P2r° | SEE | aytuyUt LT 220° — a eee (,Auvulseull) AUBULIAS) 
oss’ | Szz° | evUuygU | 210° F6Z0' 9F0 ' FLSS0'°— ee ne et en OIGI-9IST ‘AuBULIe 
€21° | IFT ERE 6£0' FZOI° | $900 *9Z00° re ey eR ee eee ere Sov pus puvlsug 
GST’ | O9T' LOl | St00' SIT’ | O100' F£600° a ee re LI6I-LOST ‘Soye@M puB puBlsury 
1,0 jo }UULY) Uu D Pp 


























342 WILSON AND PUFFER 


mented logistic becomes infinite in three out of fifteen cases if we 
exclude Germany (1816-1910) where we have to go over to the aug- 
mented critical case, and in four out of sixteen cases if we include 
it. The values of n, the survival rate in a sparse population, vary 
for the simple logistics from .029 to .587 or, if we exclude the logistics 
which become infinite, from .115 to .587 and thus offer little satisfac- 
tion to those in search of a natural survival rate. We should hesitate 
to interpret n in the case of the augmented logistic even if the range 
were not likewise exceedingly great. In the three cases in which we 
have standard errors for both the logistic and the augmented logistic 
we note that the errors for the latter are much larger than for the 
former, although the increase due merely to the introduction of an 
extra constant would not be great,’® and we feel confident that were 
there more comparisons of this sort available we should find such a 
situation to be of frequent occurrence. 

61. Conclusions. If by the statement that the logistic, whether 
augmented or not, is the law of population growth, one means only 
that the formula is well suited to fitting the census enumerations for 
the period of a century or so when such enumerations have actually 
been made, we can take no exception to it, for we have shown that 
those enumerations can be fitted even more closely than they have 
been fitted by others. But if the statement is to be considered as 
signifying that the formula affords a rational law to such an 
extent as to permit the extrapolation of the curve for forecasting 
purposes and the interpretation of the constants as constants 
of nature, we are forced to take serious exception to it, because we 
find that there are too many instances in which the curve becomes 
infinite in finite time or has a negative lower asymptote or both and 
because the constants are too often so poorly determined as to be 
practically undetermined; in all these cases we must at least withhold 
judgment until the populations have developed so far toward satura- 
tion that the fitting of the curve will give reasonably well determined 
indications of the saturation values. If however, we abandon the 
attempt to make any preassigned method of fitting give sensible re- 
sults and avail ourselves of the very freedom furnished by the insta- 
bility of the constants we may still get a tolerably good fit to the census 
enumerations with values of the constants that are not too unreason- 
able,’? and although this method of using the curve leaves the deter- 
mination of some of the constants personal to the individual worker, 
much as a graphical extrapolation of a curve is personal to the draughts- 











LEAST SQUARES AND LAWS OF POPULATION GROWTH 343 


man, we believe that the curve thereby becomes much more useful 
than otherwise to the student of population (or other types of) 
growth. Verhulst said in 1844: La loi de la population nous est 
inconnue, parce qu’on ignore la nature de la fonction qui sert de 
mesure aux obstacles, tant préventifs que destructifs, qui s’opposent 
i la multiplication indéfinie de l’espéce humaine. . . . Une 
longue série d’observations, non interrompues par de grandes catas- 
trophes sociales ou des révolutions du globe, fera probablement 
decouvrir la fonction retardatrice dont il vient d’étre fait mention. 
This remains true today; our series of observations has not yet been 
long enough; it is possible that the next thirty years will throw con- 
siderable light on the matter; by the end of that time we may be 
definitely rid of hypermalthusian growth and in many populations 
sufficiently near to saturation to determine whether the curve of 
growth is symmetric with respect to its half-way point and otherwise 
gives evidence of being satisfactorily close to the logistic. In the 
meantime it would seem to be of doubtful utility for demographic 
purposes to go to the trouble of fitting the curves by least squares. 


V. APPENDIX. NOoTES ON LEAST SQUARES. 


62. One Linear Unknown. To clear up a number of matters rela- 
tive to the method of least squares we submit the following remarks 
thereupon. For simplicity we may begin with the case of one linear 
unknown. The early discussion of the method was in terms of the 
normal law of errors which still is often made basic; we shall, however, 
adopt a strictly statistical procedure. It will be assumed that the 
n observations E,(i = 1, 2, ---, ) form a typical set or sample 
and a double index £;; will be introduced in which 7 is constant for 
any one sample of n observations. Each observation E;; for 2 fixed 
and j variable is supposed to be any one that could be made for that 
value of 2; in other words there are n universes, one for each 2, from 
which the elements £;; are drawn. These n universes have their sta- 
tistical characters, namely their means F,,;, their variances ¢,”, and 
their higher moments u3;, U4:, °°: obtained conceptually by aver- 
aging E;;, (Ej; — Emi)*, (Ej: — Emi)’, --- over the index 7. The prob- 
lem is to determine the value of the unknown a; which minimizes 


S; = 2 ;(a;9; wus Ej)", 1 = 1, 2. ‘es, a (40) 


and to determine the mean value a,, of a; and the variance ¢,? of a; 





344 WILSON AND PUFFER 


about a» (possibly also the higher moments ta, --: of a; about a,,), 
where 9; is a known quantity dependent on the index 7. 
We have at once, using / as a sign of division, 
% 7 ~. %. a m~ ‘ 
a; = Ligik;/L9?7, am = LigiEmi/Uo? 
or , (41) 
a; ~— Om — 25 91055/24 95", (a; — Gn)? = — ( 59;6;:)?/ 9; a) 

where 6;; = Ej; — E,; is the variation of E;; from its mean E,,;. The 
ith residual is the difference 
and is not the same as 6;;. Indeed if a, be the value of a when the 
means E,,; are fitted to the expression ag; and if v,; be the residual 
of this fit, 

Dice , = ° a.> ° = “ 

Vii — Umi = 05% a oe 9;6;:;/L07 
To find the variance ¢,” it is necessary to average (a; — Gm)* over all 
samples 7. Then, squaring ©;9,;6;; and averaging over J, 


i = Di9:70;7/ p> 97)? + Dd ik Pi Phl ih oi:on/(d 9:7 , *& + h (42) 


The expression involves the n variances ¢;? in the n universes and the 


n(n — 1)/2 correlation coefficients r,, between 6;; and 6;,, 1. e., between 
the variations in the different universes. If it may be assumed that 
the n observations £;; for a given 7 are independent, the correlation 
coefficients rj, vanish; sometimes this assumption is valid, sometimes 
it is not. If it may also be assumed that the variances <¢;* in the n 
universes are all equal, o;7 = >”, then there is a major reduction to 


(42’) 
and sometimes this further anita is valid, though often it is not. 


63. Higher Moments and Edgeworth’s Theorem. A word may be 
interpolated concerning the higher moments of a; about @». 


(a; ” Am)° - (2 59:5;)*, (dg? 4 (43) 


yy’ 3 3 Qy' a Ban 2 : 5’. .Me MD os ° ° 
mi Pi 05; + O24 th Yi Dn O53 Osh + md thg ¥ivh 00510 jn9 jg 
(a; = Am)’ - (> * 2)3 
mili 





where it is assumed that no two of the indices 2, h, g are equal. The 
mean value will involve the moments uz3; of the values 6;; and also 
the mean values of such products as 6;;? 6j, and 6;; 6j, 6j,. If it may 
be assumed that the variations in the n different universes are inde- 








LEAST SQUARES AND LAWS OF POPULATION GROWTH 345 


pendent®® the higher products containing 6;; or 6;, to the first power 
must vanish because the mean of 6;; over 7 is zero, and then the result 
is simply 

Use = 2i9,7u3i/(219:7)° (43’) 


In a similar manner 
(a; — Om)* = [Ligitdj4 + Bing gn7di7bin?+ ---|/(Uig?)* (44) 


where the terms not written involve a 6; to the first power and on the 
assumption of independence of the universes will vanish in the mean. 
Then 
_ [2 midi ‘eat Buin Qi Gh 267 on" 21/(2 D9: y (44’) 
and 
Uta — Boat = LigA(uas — 3oi*)/(Ui9,")4 


Finally if 8, and Boa — 3 be defined as usual in the theory of moments, 











tsa 9780 43” 

Ba = oa3 ae D5 9;20,7)3/ (43") 

; Usa — 300° = :4(Bai — 3) 044 ” 
Boa —3= 5.4 = (S.eee2 9 (44 ) 


If it can be assumed that the m universes have identical statistical 
properties with respect to moments these results reduce to 


Ba _ B2s98/(29.2)?”, Boa —3= (Be _ 3): 9%4/ Xi9:")? (45) 


where B = 6; and By = Bo; are constant with respect to 2. 

Edgeworth emphasized the proposition that no matter what the 
universe to which the observations belonged, the mean of a sample 
of i: observations tended, as / increased, to be distributed normally— 
indeed, this is sometimes referred to as Edgeworth’s theorem. He 
further pointed out that our statistical quantities were often really 
mean values and that whenever this was so and the number in the 
sample was large we were certainly justified in assuming a practically 
normal distribution of the means. This theorem is a special case 
where 9; = 1 of the results just obtained, at least insofar as concerns 
the determination of the frequency function to fourth moments. For 
in the case of means the values 7 are permutable and we have (a being 
now the mean of £) 

a B : B, — 3 


9 —- 
in Ba _ 1/2? Boa —3= 
Li | or nN 














346 WILSON AND PUFFER 


The skewness as measured by 8, falls off inversely as the square root 
of the number in the sample; the kurtosis Bo, — 3, inversely as that 
number. It is, however, clear that the theorem Edgeworth* states 
for the mean must be true in essence for a great variety of least 
squares problems involving the fitting of an unknown a. Restricting 
the discussion to the assumption of independence of 6;; and 6j,, one 
may observe that 2;9,;?c;2 may be written as n times the mean value 
of 9,0, and that so long as we are operating over a more or less fixed 
range for 9; and ¢;, the value of ®, will fall off as 1/n, 8, as 1/n'/? and 
of Boa as 1/n. 

64. Usual Expression for Variance. In most least squares problems 
the statistical constants of the different universes corresponding to 
the different values of 7 are unknown; we have in fact merely the ob- 
servations as given. From the solution a; we may obtain the residuals 
v;; and the value S; = 2jv;;2 of the sum of the squares for the best fit. 
We have 
Li(0j¢ — Umi)? = U46j2 — (2X 9i8;:)?/(2 92) 


The mean value (over 7) of the two sides is readily obtained on the 
assumption that 6;; and 6;, are uncorrelated. 


d (0; bana Umi)” _ 2 ;77 bee 22 VjiRmi + Dimi 
Now the first term is S;, the sum of the squares of the residuals in 
the given sample and the mean value of this will be some quantity S,,. 
The mean value of 2;; is %m;; and L vm,;7 is that value of S, say So which 
would be obtained from fitting the means F,,; to ag;. Hence 


“Y ’ ») lf 9 An 
Sm — So = 2402 — U92%02/(2 9,7) (46) 


If the further assumption be now made that o;? = ¢o? it is possible to 
eliminate s? between this equation and that for ¢,” to obtain 
Sin — So l 


> «SS 


i — l mi vs 








(46’) 


In a given case S,, is not known but if 7 is large the difference between 
S,, and the particular value S; in the sample is probably small. Also 
So, the sum of the squares of the residuals obtained from fitting the 
universe means E,,; to a9; is unknown. 
The usual formula for the variance of the unknown is 
& 
= S 22 (47) 


* ff) 
—1y1 






















LEAST SQUARES AND LAWS OF POPULATION GROWTH 347 


We may now summarize the assumptions which will suffice to 
demonstrate that formula: 

(i) The means E,,; are practically equal to a9;, that is, it is assumed 
that the “law”’ ag; which is fitted to the observations Ej; is the law 
which fits the means so closely that the sum of the squares of the 
residuals in the case of the means is negligible compared with the sum 
of the squares of the residuals in the actual sample fitted. 

(ii) The sample is large enough so that without important error 
the value of the sum of the squares of the residuals for the sample may 
be used for the mean value of that sum for all samples. 

(iii) The values of ¢,?, the variances of the possible observations 
for each value of 7, are all equal. 

(iv) The variations 6;;, 6;, of the observations from their means for 
two different values of 7 are not correlated. 

It is clear that in any particular case of the application of least 
squares it would be difficult to assert that these four restrictions are 
in fact satisfied to all practical purposes; it is of course possible that 
their failures to be satisfied might result by accident in a balance of 
errors which would leave the final formula practically correct, but 
one cannot rely on such an accident. 

65. Several Linear Unknowns. The analysis for k linear unknowns, 
the usual case, may be carried out almost letter for letter identically 
with that for one unknown provided the matrix or dyadic analysis be 
used.* The problem is to minimize 


8S; = Uilanguit aegit «+--+ angi — Ej)? = Ui(aj.d:— Ej)? (48) 


‘ 


where a; is a © vector”? unknown and 6 is a known “‘vector”’ and the 
dot is the sign of scalar multiplication. Differentiate: 


dS ; = da; , V aw = 2da,; ° L i; (a; ‘ 0; on E ;:) = (0 (49) 
for all da;. Hence 


Lib; (a; . 0; =“ Ei) = QQ or Y (a; . O10; —_ OF ;;) = () 

or 

a;. L050; = LOE j; or a; = (XY ,0;)7 ‘ L0;E;; (50) 
It may be observed that the dyadic £j,6; is symmetrical or self- 
conjugate and that it may therefore be multiplied indifferently on 
either side by a vector such as 1 ;E;,0;.. Again introducing 6;; = Ej; — 
E,; and using a, for the unknowns obtained by fitting E,,; to am . ; 
we have 








348 WILSON AND PUFFER 


a; — an = (Xidi0;) ; Xi 55 5 (51) 
Multiply a; — a,, by itself, thus forming a symmetrical dyad, we may 
write 
(a; —_ Gm) (Bj eid Bn) a (X ids)! ‘ 2.0: bid; 05% . Lid; 3) . (52) 
Here we introduce assumption (iv) and take the average over 7. The 
average of the dyadic will be a symmetrical dyadic which will be 
designated as Usa, the second moment dyadic. 


Wea = (Lididi) . Vididis? . (Vididi) (53) 
Next introduce assumption (ill) and 
= (2.d:0;) 0? (53") 


where so” is the variance of 6;;, equal for all values of 7. 
The dyadic usa gives the variances ¢,” of all £ components of a and 
also the mean products =, of any two of the components. Indeed 


(XLidid;) o2 = oyu + Tie + °°: + TKULU, 
P MrgWeU; + oe"Uce + --------- (54) 
+- i. e.6. a6 Ow S- 6: 68S Oe ee OOO Se Ow Se OO + 37“ U;,U; 
where Uj, Us, ---, Uz are units. On the other hand 
v — >. a 
Vidi = Lig 7FUUe + UigigeUiUe + --- + UigiegesUsUy 
SY nw a 2 FA! 
+ ide GeiUeUs + Di $2i7UUe ee (54') 
a Ter er ee Te TET TT Te eT CT OC ST +- dX 9% i7U. Uy 
Owing to the relations between reciprocal dyadics or matrices the 
variances ¢)”,---, 2 of the & unknowns and the mean products 


Ti2, °**, Tk-1,k May be written as the quotients of the cofactors 
of the determinant of the matrix 2,6; by that determinant (all 
multiplied by oc? which comes from the 6’s). 


Now using assumption (i) we may write 


S; = 246)? — Yidjd; . (Vidids) . Vididj: (55) 
or in Gibbs’s double dined 
S; = 2,6)? — (210:0,;)7 : 2446 ;;2 10,6 ;; (55’ 


Using (iii) and (iv), we have for the mean over j 


Sm = no® — (Xidid,) 3 Lidid je? (56) 








LEAST SQUARES AND LAWS OF POPULATION GROWTH 349 


As the double dot product of a dyadic and its reciprocal is equal to the 
number of dimensions, we have 


Sm = (n — k)o? (56’) 


Hence if S,, and S; may be interchanged by assumption (ii), 
=—* Sb)" (57) 
2a ane k: a 


This contains the usual formulas for the variances of the unknowns 
and the less common formulas for the mean products of pairs of un- 
knowns. The simplicity of the proof in matrix notation is noteworthy. 

66. Higher Moments. If we write for brevity ® = 2 ,,¢,, the higher 
moments may be expressed in multiple product notation as 


(&; — &m)(&j — @m)(&; — Bm) 
= ©" . 246;,)@7 . i946; @7 . Uidi6;:) (58) 
= ©@'6—1g-1 : Y 556521055 5521056 j; 


meaning that each of the three vectors 2 4;6;; is multiplied into one™ 
of the dyadics ®'. The resulting third moment expression is a triad 
and its average will be a triadic which will express the third moments 
of each of the k unknowns and the mean products of the square of any 
unknown into any other and of the product of any three unknowns, 
all unknowns being measured from their means. Moreover if the 
independence of the 6’s for different 2’s be assumed and the identity 
of the values v3; be further assumed, 


Use = DIDIP? : Lidididitrs (58’) 


This is the third moment triadic for the distribution of the unknowns 
about their means. It is not easy to write out the components, but 
no less tedious from this than from any other expression that might 
be obtained without the vector notation.® 

Relative to Edgeworth’s theorem we may note that if the value of 
n be increased and the range of d; be more or less fixed, (2 ;4;) will 
vary inversely as n and hence usa will vary inversely as n. As to Usa, 
each ©—! varies inversely as n whereas the term into which it is multi- 
plied varies as n so that wza varies as 1/n?. As usa varies as 1/n, the 
skewness of the n-fold distribution of the k unknowns would presum- 
ably vary as 1/n/?. 

It is possible to go on to higher moments. The fourth moment 
would involve a tetradic 














350 WILSON AND PUFFER 











Lda = pmo-!p-!p-! ; Disb 560 556 5 955 5 358 5 (59) 


where the mean must be taken over all samples 7. Under the assump- 
tion of independence of the 6’s and of identity of the constants of the 
different universes we have 


4g = P1O'1S 1G : [Didishihihirs 
+ Laldidioadn + didadads + Hihadin)o*] (59’) 


Now if we have a pair of dyads ab and ed any combination of them 
which is distributive would be called a product. Thus not only the 
tetrad abcd is such a product but the tetrads acbd and acdb formed 
by permuting the letters are such products and the tetradic 


abcd + acbd-+ acdb 


may also be regarded as a type of product of ab and cd (for if any one 
of the four vectors a, b, c, d be written as the sum of two vectors as 
b = bi + be the product of ab and cd will be the sum of the products 
of ab; and cd and of abe and cd). Let us denote the sum of the three 
terms as ab Xcd. Then any two dyadics may be multiplied in this 
fashion and in particular yea may be multiplied by itself as ue. & ura. 
We have then 


Uta — Yea X Yea = PIDID ID : Vihishishihi(us — 304) (60) 


The tetradic ty, — Yea X Yea may be shown to vanish for the normal 
distribution of the & unknowns.*® It will vanish if the 6’s are dis- 
tributed normally because then wy = 3o4. It will vary as 1/n® and 
as the variances vary as 1/n, the kurtosis coefficients would vary as 
1/n. Hence so far as the fourth momefht we have the generalization 
under very moderate restrictions of Edgeworth’s Theorem to any 
least squares solution: The distribution of the unknowns tends to 
normality (in & variables) when the size n of the sample increases 
indefinitely. 

67. The Case of k Unknowns with Weights. The problem is to 
minimize 


' — mS ! ry 9 : ‘ . 
dS; = yw; (a; OF Ej;) ; ‘i= l, 2, ‘te (61) 
the results are 
= ’ — yl ’ *< 
aj = (2 wipicp;) b, Uswspili js (62) 
7 . | ve 
Qi — An = (Lywi.,) 1 . Dywids6;; (63) 


oo = (dX wisps)” L ° LY wZoid so? ° (XL wiics) (64) 





LEAST SQUARES AND LAWS OF POPULATION GROWTH dol 


It is to be observed that ® = 23,0; has become replaced by Lyw,d;d;. 
It is readily shown that 


Sm = Vwi? — Lyw7,9;97 : (Uwididi) (65) 


The reductions require assumptions (i) and (iv), and if we go on to 
assume S,, = S; we need (ii). What takes the place of (iii), the con- 
staney of ¢,? for different values of 7? Clearly what is needed for 
algebraic simplification is 


(ii’) that w,;¢,;? shall be independent of 27, 


not that ¢,;? shall be constant. That is, the weights w; must be taken 
inversely proportional to the variances ¢,? of 6;;. With this assump- 
tion the usual formula, as expressed in matrix notation, is 


S; 
lb, = 7 DF 30,0;) 71 66) 
Le rpg WiDiDi, (66 


Thus if we are to use the ordinary formula for the standard errors 4 
of the unknowns we must assume that the weights and the variances 
of the 6;; have this specified adjustment. 

68. The Principle of Efficiency. The matter may be put in a differ- 
ent light. R. A. Fisher has laid much emphasis ‘on the concept of 
“efficiency” in statistices,°®> by which is meant the best use of the 
material, the use which gives the greatest precision to the results. 
Now any linear combination of the unknowns may be written as 
1. aand as a change in1. @ may be written as 


A(l.a) =1.Aa, [A(l. a)? = 1. Aada.1 (67) 


We have by the familiar argument (1. a) =1. uoo.1 where ea is 
given by (66). The values of ¢; being properties of the 6’s in the & 
universes must be regarded as fixed even if unknown. The question 
may be asked, what system of weights is the most efficient, 1. e., what 
system will make the variance o°(1.a) of a linear function of the 
unknowns as small as may be, and thus make the determination of 
the unknowns as precise as possible. We may differentiate (67) with 
respect to each of the / quantities w; and show that the conditions 
resulting from the vanishing of the derivatives are equivalent to 
wc = const., but a purely algebraic proof may be given. As a 
constant factor applied to the weights will not affect the value of 
1. wou. 1 we may take wis? = | for the case where w,s,7 is constant; 











352 WILSON AND PUFFER 


a 


then vo = (Lywid:d;)t. Now let w; = ujs;-?. The differences of 
the variances in the two cases is 


Dn a wl 3 St 9. ot Don al - Sn «il - 
Ll. [((X 597700 ,) 7 . X uzoi7Oi0;.. (ZS aui977b.,) 1 — (23977 .0,)| 1 


This may be written as 


Lil. | (LS yugs77d.0,)71 . (uys0;) — (2 5977O.0,) . 3710; } 
which is a sum of squares and cannot be negative, nor in general 
zero unless u; be constant. Hence any assignment of weights other 
than w;s,7 = const. will increase o?(1. a) and thus diminish the pre- 
cision. * 

69. Some Comments. Throughout these demonstrations it has 
been assumed that it is the mean values E,,; which fit the “law”’ 
exactly. This tacitly assumes that the means are the truly representa- 
tive values. It is well known in statistical work that this assumption 
may not be true, that the median or the mode or the geometric mean 
may be preferable. In the same manner although we have used the 
variance as a measure of scatter, other measures may be preferred. 
It can only be remarked that the method of least squares does in fact 
use means and variances. This may be due largely to the fact that 
those quantities are the easiest to manipulate algebraically. If we 
had to reduce observations in which the geometric means were con- 
sidered as better than arithmetic means we should presumably regard 
the observation as log /;; rather than as /;; and fit some expression to 
log £;; adjusting the weights inversely as the squares of the variances 
of log £;;, and consider this as the best approximation we could 
make to the conditions (i), (ii), (ill), (iv). There is, however, one 
assumption that might perhaps be bettered, and that is (ii) the re- 
placement of S,, by S;. The quantity S; averages at S,,, but a specific 
value of S; owing to the skewness of the distribution of S; may most 
probably be less than S,,. 

The variance of S; may readily be found. This will give an esti- 
mate of the error made by replacing S; by S,, and is of more im- 
portance than to determine the most probable value of S; which 
would depend on third moments, would be complicated, and we should 
still need some estimate of variability due to sampling. 


ee, ee. —1 NWA 3’ 4 























LEAST SQUARES AND LAWS OF POPULATION GROWTH 353 


S2 = Di6jA+ Dadj26p2 
— 2-1 : [N.b:55;4 + Dadndndj2djn2 + «+ -] 
b> OiOi¢ 0:0; 65:4 + LY inlO; re) OnOh 
+ DiGnGidn + Didadnads) 6j75j7? +--+] 
where the terms designated by --- involve a 6 linearly and vanish in 
the mean over 7. Now 


Lindi OiOnOn = Y O10: Lidron —2 DiDi Oi: 


-—g (68) 


and 


PIO : (Gadididnon + didndidn + Oidndndi 
= a x @"! Y Oi :OnOn 


where the sign X again denotes the specially defined product which 
is symmetrical in the units. Hence 


S? = nugt+ n(n — 1)o4 — 2hug — 2h(n — 1)o4 
+O4 X O41 DoGPioic (68 
—O" XG": Vidididid wri Ph : Vididididins 
A simple way to calculate ®' K ®-!: O®@ is to consider that ®P is 
expressed in terms of orthogonal units as 








UU) Us Ue u;.U;, 
~p = eu,u;+ CoUn Uo + cee + C;.U,.U,., @p- _ — + —— “in a ..- — 
C1 
u,;u,uUu,Uu u;u;u,u u;Uu,u;u u,;U,U;,U; 
o- xX @-! = 3y, l — l - jMi Uy) aT jUpUi; aT jUp,U,Uy 
Ci CiCh 


Multiplied by ®® the last two terms give nothing; the first term 
gives 3k and the second k(k — 1). Moreover as £30;0;0;0; 1s com- 
pletely symmetrical the result of applying ®! & ®~ is the same as 
that of applying 3®'@". Hence 
8? = [n — 2k+0°O : Y.b:6,00;)u4 

+ [n(n — 1) — 2h(n — 1) + 2?+ 2k —307°1O7! 


Subtracting S;2 = (n — k)?s* gives 


3.2 = [n(Bs — 1) — 2h(Bs — 2) + (Bo — 3)@'O : TH:6:0,0;]° (70) 





1 iO:0:0:0;|o4 (69) 


The last term varies as 1/n and vanishes when the distribution of the 
6’s is normal; it is presumably of small account in any case. Hence 
one may write approximately 








354 WILSON AND PUFFER 





S+ os = [n—kt+ V(h — 1)n — (& — 2k |e? (71) 
which in the normal case reduces as is well known to 
Stoo =(n—A[L 4 V2/Vn — k Je? (71’) 


If however the 6’s were distributed upon a frequency function which 
gave Bo. = 6 





Sto, = [n—k+ Vin — 8k Jo? 


It may therefore be inferred that in most cases arising in fitting 
curves by least squares the determination of the variances of the 
unknowns is at best tolerably poor. 

70. The Non-linear Case. Non-linearity may arise with respect to 
the unknowns or the observations or both. In fitting Ce"‘ to E by 
minimizing &(Ce"* — E)* the observations E enter linearly, so does 
the unknown C, but the unknown 7 enters non-linearly. If one takes 
logarithms and fits nt-+ a where a = log C to log E by minimizing 
x (nt + a — log E)? the two unknowns enter linearly but the observa- 
tions enter non-linearly; and if it be desired to use weights so as better 
to approach the solution of the former case by minimizing 2 E?(nt + 
a — log E)? the quantity FE enters in an additional manner. The 
assumptions which have been introduced will not apply identically 
to the several cases. Thus if the means E,,; of Ej; lie precisely on a 
curve Ce”'i, the means of log E;; will not in general lie precisely on a 
line nt;-+ a because the mean of the logarithm is not the logarithm 
of the mean. Also the condition that w;s?; be independent of 7 will 
not apply precisely to the two cases. A frequent non-linear case is 
that of the minimization of the sum of the squares of the relative 
residuals; another arises if one undertakes to minimize Pearson’s ¥’. 

The sum S; to be minimized is always a sum over 2 of a function 
S;; of the unknowns a, de, ---, a,, symbolized as a, and of E;; with 
the characteristic that there exists a value a such that S,,;(@, Emi) 
vanishes, i. e., we assume the means E£,,; all lie on the locus fitted to 
the observations. The function S;;, except when it vanishes, is posi- 
tive. An expression which seems to include all those which might 
arise in least squares is 


Sji(ai, +> ax, Ej) = wil, ---, ax, Ej) (fila, ---, ax) — Q(Ex)P (72) 


where w; and f; contain also the index 7. The function w; is positive 
and fi(ai0, ---, @i0,) — Ems = 0 for every 7. We may expand Sj 








LEAST SQUARES AND LAWS OF POPULATION GROWTH 3590 


about the values a and E,,; by Taylor’s Theorem. The derivatives 
of S;; for a = a and E;; = En; are readily computed 


OS 53 Ow; ‘ , ) 
(*) = (> ) [fio— o( Emi)? + 2wiol fio— o( Emi) | ( fs) 
Jan /0 d 
as Ou dg 
pained. _ : —" 9) om = 
(5 *) (Fe or) [ fio o(Em:) P— “aW sol fio o(k wl). 0) 
OPS ji > \I2 dws of 
(= #) —= (= ‘). [fio— o( Emi) | +4(2) fo “(Enol( 2) 
fi\ ( #h 5, (#> 
+ 2win (% )+ 2wiol fio — (Emi) ah a 
(sae, Jo 2( 3a) (Ste) 
00,0, ”\ dan, /o Od, Jo 
( 0S 5; (#) ( dg ) 
= — 2win( : 
00,0E;; Jo da, Jo\ dE; Jo 
Si Malia ( dg 2 
GE2)o ~~ ”™\ dE Jo 


In vector notation if we designate differentiation by a by VY and by 
E;; by D and differentiation by both by 0, we may write 
(OOS;)0 = (VV Si)o + (VDSji)oUn+ + Ut (V DS;:)o 

+ (DDS ji)oUHU+H = 2wo( VV Sido 

— 2wo(D ol Vitti + Wet Vfilo + 2wio(D 9)? U+iUe+s 


i 
— 














(73) 


























where V carries the units WW, ---, U;, assigned to a), --- , a, and 
U,+; is the unit assigned to E;;. It is observable that 
(V VS;:)0(DDSji)0 = (VDS iV DSji)0 (74) 


and it will be assumed that this relation is satisfied by any function 
S;; which enters into the specification of a least squares problem. 
Now the sum to be minimized may be written with a = a — & as 


S; — 3aa :,(V V Sjio+ a. 2 56;(V DSji)o+ 5s O5e (D*8;i)o+ - 


where the derivatives being taken for the values a = 0 and 6;; = 0 
are definite constant quantities, though unknown because the means 
Eni and the value a are unknown. The expression is therefore only 








356 WILSON AND PUFFER 


of theoretical value as a base from which to work. If the expansion 
be stopped with the quadratic terms the results are as in the linear 
case. 


a; —_=- (LV VSji)0o7° ; » 6; V DS ;;)o (75) 
As 6;; occurs linearly, the mean of a; is zero or @; = @. For the 
variance 
aja; = (x;V VSji)07' a3V V Sji)o" »0;6;7?(V DS ;5)oX iV DS ;;)0 
QjQj = Yaa = (x;V V S5i)o7 u;V V S307: 24(V DS;:)0(V DSji)o07 (76) 
There is, in this form, no indication of cancellation whatever the 
value of o;7.. We may, however, use (74). Then 


2a — 2iV VS 3i)o7 uiV VS;ji)07 -4(V V S35)0(D?S 53097 (77) 


Now if it be assumed that (D?S;;)os;2 = wo? is constant 
Use = LV VSj)o we? (77’) 


Similarly with the use of (74) one may show that the mean of S; is 


Sn =3(n—k)we? and ya= — (LAV VSj)0' (78) 
The solution is as in the former case with (©;3V VS;;)o taking the 
place of 20:0; or Lwid,d;. The quantities S, and 2,(V VSji)o 
being both unknown can only be replaced by values presumably near 
to them, viz., the actual sum S; and the second derivative V VS; 
taken at the values of the unknowns which minimize S;._ But there 
is a difference here which may be noted; in the linear case V VS; is 
constant, i. e., independent of the sample, whereas in the non-linear 
case it depends on the sample. 
For the general form of function (72) assumed for least squares 


Gui’) = (D?S3)0 = 2wo(Do)o? and wiy(Do¢) °c? must be constant. 


Thus the values of the variances in the n universes do not vary in- 

versely with the values of the weights unless Dg is constant. For 

example if one were fitting nt-+ a to log E without weights one 

should expect that o,? varied inversely as (1/E)? or directly as E’; but 
~ 


if one takes the weights as E*, then o,* should be constant. These 
results are as might be expected. If one were minimizing 


l 1 \? 
p> (2 , a; being the harmonic mean of £;;, 








LEAST SQUARES AND LAWS OF POPULATION GROWTH O07 


the variances o,;? must be supposed to vary as FE‘; if in fact the vari- 
ances were the same in the different universes one would have to use 
weights as £4 if he would apply the ordinary formulas for the variance 
of the unknowns. 

71. A Second Approximation. The next step is to remove the 
restriction of the expansion to terms of the second order. As the 
formulas become involved it will be desirable to limit the discussion 
to the case of one variable. Further let us suppress the S;; and the 
subscript 0 and write (V VS;;)o as simply V7, (V DS;;)o as V Di, etc., 
keeping the index 7 to indicate that the values are different for differ- 
ent indices 7 though taken for a = a and E;; = E,; and thus in- 
dependent of the sample. (The symbol V, retained to denote differ- 
entiation by a, has now no vectorial significance.) Then, including 
terms of the third order, 


8; = 40LV 2+ ad,6,V Di + 42;6;2D2 + §8U:V 2 
+ 30°255;; 2D, + 342;6;2V D2+ §2d:6;8D?3 
Differentiate and set the derivative equal to zero: 
= gl V2+ L6,;,VD + 4aPf2D VIA ojL6;V2Di + $2,6;2V D2 
For the solution put dS; = 0. The first two terms give for a; the 


value (75) which may be called «;,. If we set «; = aj; + ¢; we need 
retain ¢ only in the first term. Hence 


VidiVD; 1 (Vid4VD)2:V3 























- V2 ~ 9 =;V 2) 
4 2beVDBiV2Ds 1 BedsVD? 
2;V 7)? “s 2 UV 2 
_ _ 1 EAVDeBsVe | EVDiVDist 1 DWV Dest 
ai (2:V 4)" S:V22 2 &,V? 


The mean value of «; is no longer zero, but involves the squares of ¢;. 
The mean of «,;;? is known, that of «;? may be had by adding that of 


2 a8; 


] (X56; VD )z;V 2 (%6;,V D;)*X;6;;V7D; 
2 Eve z:V 2) 
| D6. VD 20 iV D2 
2 DV 2) 




















358 WILSON AND PUFFER 


a XV D;)?0? Xi( VD; u32.V 3 


9 

















af FS a3 Y;V 2) 
5 Lil VDi)(V2Di)usi | DV DiV Doss 
7 (u;V 7)? iV i)? 


A certain amount of change in this relation may be made by (74) but 
without any evident simplification. It is clear that a;? involves the 
third derivatives of the function S and the third moments us; of the 
6’s. To obtain ¢,2 we should substract 2,2 which would involve o/. 
Now although w3; = B;;° may be small because 8; may be, so that 
terms in ¢;4 would not be negligible compared with those in £;<%, 
it is clear that the next approximation will bring in additional terms 
In 4; = Bo;c;4 and in ¢;‘ so that it is unnecessary to include terms in 
¢;* until that approximation is reached. If, however, we are to have 
convergence which is rapid it is clear that terms like 














(VD)SiV3 UAVD)(V2D) ~TVDVD2 
co ”% 9» > | °F /~ 9 , “y 9\9 
(=,V 2)! (5,V2)3 ° V2)? 


must be small compared with 22,(VD,)?/(X,;V 2)’, and so on to 
higher orders. The third derivatives in terms of the expression (72) 


are 
lw; lf; \* Lf j Pf i 
V 3 = 6 i a) + Ow jo af ; J ; etc., 
da J/o\ da Jo da /o\ da® /o 


that is, they are not simple expressions. That the correction terms 
are extremely complicated even in the simplest case is therefore ap- 
parent; if one goes on to take terms of the fourth order and more 
unknowns than one, the complications become very formidable and 
it is extremely doubtful whether there would be any practical advan- 
tages in pursuing the matter further. 

The result is that we may infer that the questions discussed in 
Arts. 27-29 relative to the expansion of the variation before squaring 
and to the exclusion or inclusion of certain presumably small terms 
in the true expansion of S and to the exclusion or inclusion of such 
terms in the Hessian of S which is used to express the value of the 
variance of the unknowns cannot be answered without comparing in 
detail expressions which are so complicated that comparison in 
general appears impracticable. It may be assumed that for any 
practical case the values of the variances might equally well be taken 
from either expression because in all cases where the expressions 























LEAST SQUARES AND LAWS OF POPULATION GROWTH 309 


differed appreciably it would be extremely doubtful whether either 
of them could be considered adequate without an elaborate discussion 
of terms of higher order in the expansions for those variances. This 
corresponds to our actual findings in all cases we have examined,— 
there is little difference in the immediate vicinity of the minimum 
between the inclusion or exclusion of the extra terms. The artificial 
example which we concocted“ to show that there might be such great 
differences that the solution could be found by one set of equations, 
but not by the other, while informative for that purpose, does not 
upset the statements just made because we were fitting y = ma-+ e® 
ory = mx-+ c’ to the points (0, — 1) and (1, 1) with the solution as 
y = x. Nowif the problem be set up as a sampling problem we should 
assume either that (0, 1) and (1, 1) were merely representative points 
from universes (0, — 1+ ¢), (1, 1+ 7%), in which case the means 
y = — 1 and y = 1 do not lie on a line of the type considered, or 
that the points (0, — 1) and (1, 1) were sample variations from some 
means (0, a), (1, b), in which case if wa be negative the means do not 
lie on the type line whereas if a be positive the variations to — 1 
leads across a discontinuity in representation by reals and is scarcely 
a permissible variation. 

72. Germany and Chicago. We may finally discuss the cases of 
Germany and Chicago. Monk and Jeter pointed out that for Chicago 
the constants of the augmented logistic turned out imaginary for 
some choices of the four points, but real for the other choices. We 
have shown that for Germany the constants appeared to be imaginary 
and have raised the question as to whether the populations could 
be regarded as fluctuations from a (real) augmented logistic. Now 
there is this difference between the two cases: A least squares pro- 
cedure applied to the case of Chicago leads to a minimum with real 
values of the constants whereas applied to the case of Germany it 
leads out to the critical case and a much better fit of the imaginary 
augmented logistic can be found. We should therefore consider that 
the logistic was truly real for the case of Chicago and that the fact 
that imaginary values were obtained for the constants when certain 
points were used to determine them did not preclude us from con- 
sidering the populations as fluctuations from a real logistic, but 
rather indicated that sort of anomaly which is not uncommon in 
statistical work, namely, the accumulation of errors due to sta- 
tistical fluctuation.8& In the case of Germany, however, it would 
appear that the augmented logistic is truly imaginary and that if for 











360 WILSON AND PUFFER 


a set of four equally spaced censuses we should happen to get real 
values for the constants those values would have to be interpreted as 
due to accumulation of errors due to statistical fluctuations. In other 
words it seems reasonable to assume that if we had an infinity of 
hypothetical Chicagos the censuses for them at each date could 
naturally be assumed to have means which might lie on a real aug- 
mented logistic from which the actual censuses for one actual Chicago 
would vary by fluctuations that would be in the nature of sampling 
fluctuations; but that if we had an infinity of hypothetical Germanys 
it would be an imaginary augmented logistic that would have to be 
assumed as the locus of the line of means. At all events we hope we 
have made it clear that this sort of hypothetical sampling set up is 
at the basis of the method of least squares and must be kept in our 
mental background in thinking about the method. One further and 
final word: Granted the hypothesis about the Chicagos as a sampling 
problem, one could raise the question as to whether some sample 
Chicago might not turn up with a set of censuses for which the best 
logistic would be imaginary. There seems to be no reason why such 
might not be the case; but if it were, the result of working with that 
solution would be wholly misleading so far as revealing the behavior 
of the mean of the universe or the future behavior of that particular 
sample. That is the present situation with respect to Germany. 
If we believe that populations grow on augmented logistics, except 
for variations that can be regarded as sampling fluctuations from 
means which lie on such curves, Germany, 1816-1910, is simply an 
exceptional sample which happens to fit very closely on an imaginary 
augmented logistic. 

73. A Misiake of Edgeworth’s. In a brochure Mathematical Rep- 
resentation of Statistics, dated 1900, F. Y. Edgeworth reprinted four 
papers®® with an introduction. On pages 4-5 he discussed the fitting 
of the frequency function 


N Cc i 
I cet 2 " a" A+ Be 








to data for the heights of 25,878 American recruits. The details of 
the discussion are given on pages 23-25 in Note 3. Edgeworth knew, 
of course, that the fit would be bad at best; he knew that the standard 
deviation of this frequency function is infinite and the mean therefore 
indeterminate, indeed he probably knew that the means of samples 
of n from a universe with this frequency distribution are distributed 

















LEAST SQUARES AND LAWS OF POPULATION GROWTH 361 


on the same curve as the individual elements; he knew that means of 
samples of heights of recruits would show stability much greater than 
the heights of the individuals. It may therefore be presumed that he 
fitted the function merely for illustrative purposes. What he did was 
to write n, = 1/(A + Bza,”) where 2; is the distance from the center 
of the strips in the histogram to the mean (or median) of the whole 
distribution, and n, is the number of observations falling in that strip 
of the histogram. Next he cleared of fractions to obtain A + Br, — 
1/n, = 0 as an equation of condition,—thus getting the equations in 
linear form. Then he paused briefly to discuss the weights of these 
equations which he took on analogy to be n, and therefore minimized 


Xn-(A + Ba? — 1/n,)? (79) 


finding, as he states, with some surprise a negative value for A; and 
this solution he discussed, coming to the conclusion that the method 
of Least Squares showed that the proposed representation was inad- 
missible. He reinforced the argument for the demonstration of 
inadmissibility by showing that if one wrote A-+ Bz,? = log n,, as 
would be natural when fitting the normal law y = e4+®?" and 
minimized 


<n-(A + Ba,? — log n,)? 


he obtained sensible values for A and B. 
The trouble is with the weighting. In the second case the variances 
of the universes for each value of r are by (iii’’) inversely proportional 


to 
w,(D 9)? = n,(1/n-)? = 1/n, 


and therefore proportional to n,. This is a reasonable assumption, 
because in fitting a frequency function where the actual numbers in 
the different strips of the histogram are supposed to result by sampling 
fluctuations, the variances should be approximately proportional to 
the numbers themselves (being N p,q, with g, near 1 and Np, = n,). 
In the first case, however, the variances would have to be inversely 
proportional to 


w,(D 9)? = n,(1/n,?)? = 1/n,3 


and therefore proportional to n,> which is not sensible. The weights 
are therefore not 7, as in fitting by minimizing (79); it is readily seen 
that the expression to be minimized should be 


Xn(A + Ba,? — 1/n,)? (79’) 





362 WILSON AND PUFFER 


which will make the variances again proportional to n,, and thus the 
weights of the equations A+ Bz,? — 1/n, = 0 are n,°. If these 
weights be used we are really minimizing 

I : [n, — (A+ Bz,)“? 
> ,2)2 ~ 2 
Ln,(A+ Ba,?)? | nn, — - } = Ln,(A-+ B2,’ . 7 
(A+ ) ( A+ =) (Ar Ber) (A + Bz,) 


which is Pearson’s y? except for the multiplier n,(A + Bz,*) that must 
be nearly equal to 1 if the fit is at all good. With these weights 
(n> instead of n; as used by Edgeworth) the value of A does not 
come out negative, it is positive—we find®° 
] 
000236 + .000030892? 











The fit is assuredly not good, as was evident from the start, but fitting 
by least squares is not inadmissible, there are no paradoxical results. 
It is difficult to imagine how Edgeworth who was so able a mathema- 
tician and who was especially distinguished for the excellence of his 
judgment of situations, whether mathematical or factual, could have 
been misled by his erroneous solution into giving in support of it such 
explanations as he gave instead of seeing in it indications of its own 


erroneous character even to the discovery of the particular error 
which had been made. 


REFERENCES. 


i Read in preliminary form before the Seminar in Zoology, University of 
California, Berkeley, April 3, 1929; read in part before joint session of Sec- 
tions A and K of the Amer. Assoc. Adv. Sci., New Orleans, December 31, 1931. 
Presented to the Academy, April 12, 1933. 

28 Pearl, R. and Reed, L. J. Proc. Nat. Acad. Sci., 6, 275-288, 1920. 

2> Knibbs, G. H. Appendix A, vol. I, Census of Commonwealth of Aus- 
tralia, 1-33. 

2° DuPasquier, L. G. Vierteljahrschr. Naturforsch. Ges., 63, 1918, pp. 
236-249. 

24 Hain in Statistics of the Austrian Empire (1852). This reference is quoted 
from Edgeworth, J. Roy. Statist. Soc., 88, 1925, p. 59, because of his comments 
on it; we have not had access to it. 

3It is curious to call the exponential law. Malthusian when Malthus’ chief 
thesis was that it could not be followed, but the usage seems established. 

‘The constant n has “dimensions” of the reciprocal of time and its magni- 
tude varies inversely to that of the unit of time, i. e., n¢ is a pure number. 

‘> Verhulst, P. F. Correspondence Mathematique et Physique par Quetelet, 








LEAST SQUARES AND LAWS OF POPULATION GROWTH 363 


vol. X, 1838, pp. 1138-121. Nouv. Mem. Acad. Roy. Sci. Belles-Let., Brux- 
elles, 18, 1845, pp. 1-38; 20, 1847, pp. 1-32. 

‘We may also, as Verhulst points out, integrate if ¢(P) = naP*t' as 
P-*=a-+ be". This is a four constant equation which reduces to the 
logistic when i = 1. For the general form the population converges to ~/1/a 
and the inflection point of the plot of P on time is when P = W/[a(i + 1)J—". 
The ratio of the population at the inflection to the asymptote is (¢ + 1)~!/! 
which is .6 whenz = 1 and .577 wheni = 2 and .630 wheni = 3. 

’ Hotelling, H., J. Amer. Statist. Assoc., 22, 1927, pp. 283-314, works 
directly with the differential equation, without integration, and there are 
some advantages in so doing. He also has some sound remarks about prob- 
able errors of extrapolations. This paper should be read in connection with 
that of Schultz.** The reason that Schultz and Hotelling give such different 
probable (or standard) errors of a forecast is due to their approaching the 
problem from such different viewpoints; indeed the viewpoints may be con- 
sidered independent so that the errors should be superposed. Schultz con- 
siders the standard errors of the parameters of an empirical formula such as 
the logistic when the formula is fitted to the observations by least squares, 
Hotelling considers the deviations arising by a gradual accumulation of inde- 
pendent variations in the growth rates dP/dt. 

8 Many references will be found in Pearl, R. Medical Biometry and Sta- 
tistics, 2nd Ed., 1930, pp. 427-428. We shall cite specifically those we use 
later. 

* Yule, G. U. J. Roy. Statist. Soc., London, 88, 1925, pp. 1-62 (with dis- 
cussion by Edgeworth and others). See also Stevenson, T. H. C. Jbid. pp. 
63-90 (with discussion by Yule, Bowley, and others). 

10 East, E. M. Mankind at the Crossroads, Chap. IV. 

1! Abstract U. 8S. Census of 1920, p. 584. Unfortunately for our purposes 
the system of land classification seems to have been so different for the 1930 
census that we are unable to get a figure for that year. The table and graphs 
show that the figure for 1870 was out of line; we are particularly in need of the 
figure for 1930 in order to judge as to whether that for 1920 was similarly out 
of line. 

2 For discussion of growth paper see Wilson, E. B., Proc. Nat. Acad. Sei., 11, 
1925, pp. 451-456. 

13 For a critical discussion reference may be made to East.'® 

14 Beloch, J. Zeitschr. Socialwiss., 2, 18, (1899), pp. 505-514, 600-621; 
Ibid. 3, 18, 1900, pp. 405-423, 765-786. 

18Qne might even surmise that instead of approaching the asymptote 
steadily a population would be far more likely to overshoot the asymptote 
and settle down to it by oscillating about it, or at least to over-shoot it and 
approach it from the upper side. 

16 Knibbs, G. H. J. Amer. Statist. Assoc., 21, 1926, pp. 381-398, Jbid., 22, 
1927, pp. 49-59. 

















364 WILSON AND PUFFER 


17 The values of n were found graphically. The figures for population were 
taken from Pearl,’ p. 421 and Reed and Pearl.*4 

‘8 Wilson, E. B. and Luyten, W. J. Proc. Nat. Acad. Sci., 11, 1925, pp. 
137-143. 

19 Wilson, E. B. Science 61, 1925, pp. 87-89 showed this for Canada, and 
numerous other cases will be noted in this paper. They are often easy to 
recognize from the semi-log plot of the population by the upward concavity 
of the curve (Wilson and Luyten, footnote 7). 

*” Taken from Pearl, R. and Reed, L. J., Predicted Growth of Population 
of New York and Its Environs, Plan of New York and Its Environs, 1923, 
42 pp. (p. 27). 

*t What this means graphically is that if the populations are plotted on 
semi-log paper there is no obvious concavity of the set of points as a whole 
either up or down (Art. 10). 

22 Pearl, R. and Reed, L. J., Science, 72, 1930, pp. 399-401. They did not 
claim a correction to the 1920 figure making it 106.1 instead of 105.7 due to the 
fact that the census was taken as of Jan. 1 instead of as April 15. They had 
a right to the benefit of this correction which would have improved their 
forecast. 

- % Firenczi, I. International Migrations, vol. I, 1929, pp. 374-500. 

248 Cannan, E. Economic J., 5, 1895, pp. 505-515. 

2” Bowley, A. L. Economic J., 34, 1924, pp. 188-192. 

4° Dublin, L. I. Forum, 86, 1931, pp. 270-276. 

25 For Least Squares we may refer to Whittaker and Robinson, Calculus of 
Observations, Chap. IX. It should not be necessary to state the definition of 
least squares, which has been accepted for more than a century, but Pearl and 
Reed” after fitting the logistic to three censuses precisely and computing the 
residuals and the square root of the sum of their squares wrote: “‘It must not 
be forgotten, however, that the root mean square error is reduced in the present 
case by virtue of the fact that in three out of the thirteen ordinates theory 
and observation are made, by the procrustean method of fitting, to coincide 
exactly,’’—and the context seems to show that they meant reduced and not 
increased. It would seem therefore that they thought that with three zero 
residuals the sum of squares would be less than in a least squares fit. To us 
this seems wrong; we believe that if three deviations are made zero the squares 
of the others will add up to more than that for a least squares fit,—and we 
shall proceed on the assumption that Least Squares minimizes the sum of the 
squares. 

26 The index 7 ranges over the values of 0 to k — 1 if there are k censuses 
available. We shall drop this index in formulas subsequent to (12) and leave 
it to be understood by the reader. 

27 The 3 is inserted before 0S/da and the other derivatives to make the 
normal equations in the linear case come out as usually written. Thus if the 
linear equations are 


















LEAST SQUARES AND LAWS OF POPULATION GROWTH 365 


ajxx + by + --- = n; so that the errors are a;x + by + +--+ — ni, 


where x, y, :-: not a, b, --- are the unknowns and n is the quantity ob- 
served, the derivatives of 


S(x,y, --:) = Lax + diy +--+ — ni)? 


are 
OS/dx = 2(axr+biy +--+ — nij)a; = 2(La2x + Laybiy + --- — Lani) 
aS/dy = 2a + biy + +++ — ni)bi = 2(Laybix + Udy + --- — LVHjn;) 


etc., and the standard normal equations are 40S/dx = 0, $0S/dy = 0, ete. 

* These ‘“‘weights of the residuals” in the three equations must not be con- 
fused with the ‘‘weights of the observations” (Art. 22) in a least squares solu- 
tion or with the ‘‘weights of the unknowns” (Whittaker and Robinson,” Art. 
122). 

29 The conditions (11) have been interpreted neglecting mention of the con- 
stants 1/b and 1/K which multiply the equations. Some modification of the 
statement might be necessary if b or K vanished or were infinite. 

% The method of moments is not a method of least squares for the logistic. 

31 Pearl, R. Studies in Human Biology, 1924, pp. 578-581. Our statement 
as to the conditions which are equivalent to his method is based on the pro- 
cedure of the computation form. It seems difficult to state precisely what the 
conditions would be on the basis of the algebraic theory which leads up to 
that form because the author seems to have confused # and P and thus to 
have been led to perform operations which are inadmissable in least squares. 

® Schultz, H., J. Amer. Statis. Assoc., 25, 1930, pp. 139-185. Schultz 
states (p. 164) that it is difficult to give meaning to the ‘‘residuals”’ Pearl and 
Reed are minimizing. We hope that the discussion in Art. 21 and the example 
in Art. 42, though not supplying any strict meaning to those residuals, will 
throw some light on the difficulty. 

%8To illustrate the difference which would arise by putting E = P in the 
weights of the residuals we may consider the first of conditions (14). The 
difference 

SE(E — P) —Z=P(E — P) = X(E-P)P=S8S 
Thus if we replace P(E — P) by LE(E — P) and then make the latter 
vanish, =P(£ — P) will not vanish but will be negative and equal to — S. 
In case the fit is very good S will be small and the conditions will be practically 
equivalent. 

34 It should not be necessary to define weights but Reed and Pearl writing 
in the Journal of the Royal Statistical Society of London, 90, 1927, pp. 729- 
746, state definitely (p. 742) ‘‘The use of absolute residuals weights each 
observation in proportion to its magnitude. This is in accord with the cus- 
tomary usage in the fitting of curves.’’ To us this seems incorrect. We be- 
lieve it would have seemed incorrect to two great predecessors of Pearl and 
Reed at the Johns Hopkins University—C. 8S. Peirce and Simon Newcomb, 








366 WILSON AND PUFFER 


who were expert in least squares and in the reduction of observations. The 
definitions which we give would require us to mimimize LE(E — P)? if we 
were to weight each observation in proportion to its magnitude. 

35 Wilson and Luyten,'* while not actually fitting by least squares pro- 
cedures, did try to render small the sum of the squares of the relative residuals 
(rather than that of the absolute residuals). They did not, however, advocate 
this method as preferable. The chief reason for their choice of relative re- 
siduals was that they were working graphically on semi-log paper with method 
(8) without making allowances for change of scale in the different parts of the 
graph. They state in their footnote 15 that they believe the Pearl-Reed 
values (for Greater New York) are very near to those obtained by basing the 
least squares criterion on actual instead of percentage differences. We shall 
find that they erred in this belief: The solution by actual differences gives 
the limiting population 23.4 whereas Pearl and Reed found 34.9. Wilson 
and Luyten’s approximate relative difference solution assumed 16.7 whereas 
the least squares solution gives 16.2 (Art. 55). 

36 Pearson actually fits his frequency functions by the method of moments 
and then applies x? as a test of goodness of fit. For a discussion of this matter 
see Fisher, R. A., Phil. Trans. Roy. Soc., London, Series A, 222, pp. 309-368. 

37 Of course if the fit is really close the different systems of weighting all 
lead to essentially the same values of the empirical constants. Whenever the 
fitted curves lead to distinctly different results with different systems of 
weighting in the least squares solution it is obvious that inferences from those 
results are largely dependent upon the appropriateness of different systems of 
weighting. 

°8 Generally done graphically on semi-log paper or on ordinary paper using 
a logarithmic scale. 

39 Obviously the unweighted logarithmic fit is very nearly equivalent to 
fitting by relative residuals. It is not absolutely the equivalent of such a fit, 
as sometimes stated, because # — P are not really infinitesimals and to be 
assimilated to differentials, they are finite. 

* We drop here and hereafter, the notation ( )o leaving it to be understood. 

41 As the ordinary conditions for a minimum are 0?S/dC?> 0, 3S/an? > 0 
and 4H = 3S/dC*. 3S/an? — (d2S/aCdn)? > 0, the determinant of the two 
equations in 6C and 6n cannot vanish at the minimum, nor (by virtue of con- 
tinuity) for some finite though possibly small region around the minimum, 
and hence if we are near enough to that position the solution of the equations 
is possible. (It is possible to have a minimum for which the stated conditions 
vanish as in the case S = ct + n‘, but this requires special and very compli- 
cated considerations; fortunately it is not likely to arise in practical least 
squares solutions. ) 

# Whittaker and Robinson,” p. 214. Schultz*® follows this same procedure. 

*’ Except such an artificial case as is discussed in footnote 47. 

* The conditions for the minimum being 0S/0C = dS/dn = 0 our problem 





LEAST SQUARES AND LAWS OF POPULATION GROWTH 367 


is to solve those equations. It is to be noted that the terms at the extreme 
left of equations (22), (24), (25), are all identical, being in fact the half deriva- 
tives of S, and only the coefficients of 6C and 6n are affected by using the 
modification of the approximation equations. If then we can get from either 
set of equations, or from any others, values of 6C and én which make the 
derivatives vanish, we have solved our problem, and the simpler the equations 
we can use, having regard to the rapidity of the convergence of these deriva- 
tives toward 0, the better. 

45 This statement is perhaps too strong. We have never found in practice 
a case in which the standard errors computed from (26) with and without the 
correction terms dependent on EK — P have differed appreciably. We are not 
sure just what might be considered the best theory of least squares, including 
the standard errors of the unknowns, when the unknowns enter the equation 
non-linearly. We are not convinced of the appropriateness of a least squares 
solution in such cases or even in some cases in which the constants do enter 
linearly. As, however, there is a tendency to express conditions for a particu- 
lar state by minimizing something, the probability is that the problem of 
fitting will be expressed as the minimization of some function, and it would 
appear as though the various estimates of goodness of fit and of the precision 
of the constants as determined should be in terms of the same function. 
The discussion offered in the Appendix seems to indicate that when the con- 
stants enter non-linearly, the problem of determining the standard deviation 
of the unknowns involves serial expansions of so high a degree of complexity 
as to make any precise determination of these standard deviations wholly 
impractical and further that the use or rejection of the extra terms would 
ordinarily not be a matter of importance. 

#6 Whittaker and Robinson,” p. 246. If a;; are the coefficients of the normal 
equations and D their determinant and A;; their cofactors, the expressions 
for the standard errors of the unknowns involve the ratios A;; : D. We may 
speak not only of the standard errors of the unknowns but of their correlation 
or of their mean products, which obviously must involve A,;: D where i + ). 
In case we solve for a and oq we may get A = l/a and o, =a,/a*; and if we 
solve for B and C we have K = B/C immediately but we cannot get ¢, from 
B, C, op, ¢-—we must have also the mean product II,-. or the correlation 
coefficient rz-- We shall have occasion to speak qualitatively of the correla- 
tion between the variables but shall not need to develop the quantitative side 
of the matter as might readily be done following Whittaker and Robinson, 
Art. 171, or A. L. Bowley, Elements of Statistics, p. 405-407. These authors 
treat correlation as a probability problem, but G. U. Yule, Introduction to the 
Theory of Statistics, does not thus restrict it, and Whittaker and Robinson 
give the treatment of least squares without such restrictions. We have 
really to do with properties of quadratic forms or symmetrical systems of 
linear equations, and in particular with the ‘reciprocal properties’ of such 
forms or systems. The details of the discussion may be left to the Appendix. 











368 WILSON AND PUFFER 


47 If we try to fit y = mz +b to some points and choose to write y = mz + & 
we may get the wrong solution as may be seen by taking x = 0, y = — 1 
and x = 1, y = 1 as the (two) points. The line y = mz + bis y = 2x — 1, 
and S = 0 because the fit is perfect. With y = mz + e® we must have a 
minimum, either actually attained or approached as a limit, for S = (mz + 
e® — y)? because this quantity is necessarily positive or zero and is continuous. 









The solution is 8 = —®,m=1,y =2,S =1. For any value of m other 
than 1 and for any value of 8 other than — « the sum is greater than 1. 
The minimum is in a sense a false minimum since 8 = — © is not truly a 






value for 8, but the line y = mz + e® approaches the definite position y = x 
as m approaches | and as 8 becomes negatively infinite, and S approaches the 
limit 1. If however we attempt to find this minimum by using the proper 
approximation equations we shall find that 68 is near — 1 when 8 is large and 
negative and m is near 1,—in other words we cannot find the minimum with- 
out an infinite number of steps. 

If one objects to the value 8B = — , the principle may be illustrated by 
taking y = mz + c? as the form to fit. Then 


$0S/dc = 4c + 2mc, 30S/dm =m+ec? — 1 











The solution is c = 0 and m = 1 and the line is y = x. Also 


$0°S/dc? = 12c2 + 2m, 407S/dcdm = 2c, 402S/dm? = 1 







The minimum is a true minimum because for c = 0, m = 1 the first and last 
second derivatives are positive and the determinant H is also positive. The 
approximation equations for finding the minimum are 


4c3 + 2mc + (12c? + 2m)ic + 2cim = 0, m+c? — 1+ 2cdic + bm = 0 
















and 6c and 6m converge properly toward 0 with c and m converging toward 0 
and 1 respectively. The formulas for the standard deviations give definite 
results. If however we expand before squaring and reject 6c? we have 

S = (c? + 2cic + 1)? + (m + bm + Cc? + 2céc — 1)? 
2c + m? — 2m + 2 + 2mec? + (8c? + 4cm)éc 
+ (2m + 2c? — 2)ém + 8c*5c? + 4cic5m + 5m? 


The “normal” equations are 
4c? + 2cm + 8c?5c + 2cbm = 0, m+c? — 1 + 2cic + bm = 0 


and 6c = — c/2 — 1/2c, ibm = — m + 2; there is no convergence to the mini- 
mum atc =0 and m=1. These illustrations show that the minimum in a 
least squares problem may depend on the form in which the constants enter 
into the equation to be fitted. When the constants enter linearly, as is usual, 
the expression to be minimized is a quadratic form in the constants and has a 
definite minimum because it is always positive or zero and becomes indefi- 
nitely large as the values of the constants retreat indefinitely from the values 
which give the minimum. When the constants do not enter linearly the func- 








LEAST SQUARES AND LAWS OF POPULATION GROWTH 369 


tion to be minimized though always positive or zero is not quadratic, there 
may be no minimum (except at infinity), and the empirical expression may 
not be fittable (with real values of the constants) to a number of points equal 
to the number of constants. 

48 Such a situation arises frequently in fitting logistics, the value of S may 
depart very little from its minimum and yet the errors in the unknowns may 
be considerable. 

49 If we are interested solely in the closeness of fit over the observed range 
of data, we can estimate the convergence by the behavior of S and if we do 
not care about a least squares fit we may be content with a value of S consider- 
ably removed from the minimum; but if we desire to extrapolate the curve we 
need to know the values of the constants, particularly if we desire to estimate 
the asymptotic population. Not only is the elongation of the ellipsoid for 
the logistic often extreme, but we shall see that the shape of the range of values 
for the constants over which S hovers near its minimum may be far from 
elliptical (Art. 53). 

°° Population figures taken from Yule.’ Time in decades measured from 
the initial year, here and elsewhere. 

‘| We give in this table and in subsequent tables the figures we got without 
claiming, and indeed under the distinct understanding that we disclaim, 
validity for the last few places. Thus in our second least squares procedure 
the corrections to a and n were only about 1.5% of the next previous correc- 
tions, whereas the correction to b was 12%. The second values of da, 5b, én 
may presumably be taken to be on the whole decidedly larger than the errors 
in the final result if further approximations were made, but it cannot be 
assumed that each of the three is decidedly larger as it might be that one or 
even two of them had happened exceptionally to be very small. In respect to 
the value of S it should be especially remarked that although the enumerated 
populations # have been taken to a limited number of places (to tens of 
thousands in the present case following Yule, but generally to thousands) 
more places have generally been carried in the computed populations P in 
order to give better possibilities of estimating the change in S. 

2 [t must not be overlooked that in calculating the determinant D which 
goes in the denominator of the formula for the standard errors we lose a 
considerable number of places and hence the values of the standard errors 
are not particularly good. These errors were calculated from the coefficients 
of the simplified equations 

*$ Population figures taken from Pearl* (p. 607). 

‘44’Time measured from 1861 in our figures and from 1800 in Pearl’s. We 
have fitted to the six censuses 1861-1910; Pearl states that he fitted from 
1855 on and we should interpret this to mean that he used the data for 1855, 
but the table gives no evidence to support such an interpretation except that 
if we include 1855 the sum of the differences for Pearl's solution is only —.0011 
and thus comes near vanishing as it should when an additive constant is 





370 WILSON AND PUFFER 


available; however, in the tabulations he does not include the calculated 
value of his fit for the year 1855 and as it is seen that the error of the fit for that 
year is considerably larger than for any of the years tabulated we may infer 
that probably he did not fit to that year. 

55 To remind the reader that not all the places written are good we have in 
these four cases underlined (italicized) the places which we believe to be good. 
In making another least squares approximation we should start with all the 
places given as the values of the constants are highly correlated and a good 
many places are lost in the work and as there is very little saving in machine 
calculation by reducing the number of places. The values of o in the last 
column are +/S/k, where k is the number of censuses and thus give the stand- 
ard deviation of the residuals. (The value o’ = +/S/(k — 3) would of course 
have to be used to estimate the “error to be feared.’’) The 1920 populations 
of the four regions were respectively, 1.4, 3.2, 10.4, 14.9 so the percentage 
errors based on the terminal populations are respectively 1.6, 1.3, 2.3, 1.4; 
they would be higher if based on the mean, median or mid population as 
during the period 1790-1920 the populations have increased by the respective 
factors 6, 17, 30, 20. 

56 How is it that Pearl and Reed will offer as a least squares fit, or even as a 
satisfactory fit, one which can be bettered by a graphic Malthusian, one 
which is hypomalthusian whereas the series of populations plotted on semi- 
log paper is distinctly concave up, one which though containing three con- 
stants cuts the curve of observations (minor fluctuations excepted) in only 
two instead of in three or more points, in fine, one which satisfies none of the 
qualitative inspection criteria that would be applied to see whether or not the 
fit was at all possible as a least squares solution let alone the quantitative 
criteria (11) which might not be known to them? These qualitative criteria 
give the same indications for any part of the period 1790-1920 ending with 1920. 

The situation with respect to New Jersey and the other regions is not quite 
so bad. The following is a summary of the Reed and Pearl asymptotic 
populations obtained (p. 735) in ‘“‘fitting each of these populations with 
separate logistic curves by using the theory of least squares, with absolute 
residuals,’’ and the values we find (in millions). 





K(R. and P.) A(Correct) Comment 





Connecticut 8.771 — .610 Curve infinite in finite time 
New Jersey 15.053 — 38.920 Curve infinite in finite time 
New York 26.596 25.370 Nearly right 

Three States 49.510 77.330 Error is —36% 


The other constants needed for their summation theory are also naturally 
wide of the mark, some of them being indeed properly imaginary because 
of the form in which they took their logistic (Art. 30); that is, the values one 
should find in the cases of Connecticut and New Jersey for their a, b, AK 
(which correspond to our log b, n, 1/a) from the proper fitting by least squares 








LEAST. SQUARES AND LAWS OF POPULATION GROWTH ofl 


of a logistic in any form which is universally applicable would make their a 
imaginary because our b is negative. But if the logistic is taken in their form 
and a least squares procedure is set up for that form, no imaginary value of 
their a can be found, and presumably no real value except + © or — 
(see footnote 47 and Arts. 47-49). We should expect therefore that, with 
infinite repetition the least squares procedure would converge toward the 
Malthusian Ce”, and in fact the best Malthusian fitted to the whole period 
has C = .146, n = .169 and S = .0314 (for Connecticut). However, as the 
curve is concave up and the indicated type is hyper-Malthusian one might 
try fitting a critical case (7). Indeed a rough fitting of this type gives S = 
.027 instead of .0478 and thus the critical case fits the data better than their 
logistic and better than any Malthusian can be made to fit. It is perhaps 
too much to expect that biologists, even those who use considerable mathema- 
tics, should be so expert mathematically as to be right in handling problems 
of fitting equations involving constants non-linearly by least squares, but as 
one of the authors of this paper in the Journal of the Royal Statistical Society 
was trained as a mathematician, it is surprising to find their definitions de- 
parting from accepted usage and their fittings quite obviously impossible. 

5? This can be illustrated by the simple numerical case. Using the Reed 
and Pearl summation formulas, corrected for typographical errors, we find 
K, = 45.571, m, = e?-8, n, = 1.1367. This curve does not pass through 
the values EK + E’ = 2, 7, 17 but takes for ¢ = 0, 1, 2 the values P = 2.359, 
6.626, 15.791, so that the errors E — P are —.36, +.37, 1.21. If the compari- 
son were made for subsequent times differences would likewise be shown be- 
tween the sum of the two component curves and the curve obtained by the 
summation theory, but they might not be considered serious and the theory 
might well be considered applicable in this fictitious case. We have queried 
fitting logistics by fitting their derivatives, and we see no special merit in the 
particular summation theory derived therefrom as compared to some other, 
possibly simpler theory. Thus if we wish to have the curves add at t = « 
we are forced to set K’’ = K + K’ and we might naturally take the a” and 
b’’ (of Reed and Pearl) as simply the weighted means of a and b (weight K) 
and a’ and b’ (weight K’). In this fictitious example the result seems quite 
as satisfactory. But no summation theory can be satisfactory in any case in 
which one component logistic is hypermalthusian and the logistic for the total 
region in hypomalthusian. Wilson and Luyten were entirely familiar with the 
fact that this must be the situation with respect to Connecticut, New Jersey, 
New York and the three states combined, and considered using this particular 
illustration of non-additivity; the reason they preferred to give merely a 
fictitious illustration was simply to avoid the difficulties attendant on fitting 
long series of actual figures. If one is willing so to force the fit of the individual 
components as to make them all hypomalthusian no matter what their real 
type one may possibly get a satisfactory addition theory, but one cannot at 
the same time talk of fitting the components by least squares. 












3/42 WILSON AND PUFFER 





55 In commenting on Wilson and Luyten’s figures Reed and Pearl say 
“their equation does not fit the observations, which are the significant realities 
in the situation.’’ It is to be borne in mind that Wilson and Luyten were not 
fitting by least squares but merely trying graphically to make the relative 
residuals small. The success they had as compared with the least squares solu- 
tion may be seen from the table, their value for S is .066 compared with the 
least squares value .041, an error of 60% and their value for K was 22 compared 
with 25, an error of 12%. If these be compared with errors of Reed and 
Pearl in solutions claimed to be by least squares, it is doubtful if they will be 
considered bad. As to the significant realities of the situation, what are they? 
That every one who works in population theory must fit by least squares with 
absolute residuals irrespective of how he may wish to define his problem? 
Nobody has yet shown that this is the case and nobody has stated it to be the 
case except Reed and Pearl whose figure by that method is 36% in error and 
whose statements about least squares differ from accepted usages. As a 
matter of information it should be stated that the manuscript of the paper by 
Wilson and Luyten was sent to Pearl before being sent to press, and on its 
return was somewhat modified and sent to him in that form. Presumably it 
was shown to Reed. If Pearl had likewise sent the copy of the manuscript of 
his reply (with Reed) to Wilson, the Journal of the Royal Statistical Society 
of London might have been spared the printing of egregious errors of theory 
and practice. What service the editor of that journal thought he was render- 
ing science by printing an article obviously in part polemic and obviously 
wrong in this part is not easily imagined. For the standards which should be 
expected of those who use mathematics see E. B. Wilson, Presidential Address, 
American Statistical Association, ‘“Mathematics and Statistics,’ J. Amer. 
Statist. Assoc., March, 1930. 

58 The comparisons should presumably be all by absolute or all by relative 
residuals. The figure for the United States by relative residuals we have not 
computed, but the United States is fitted so well by the logistic that we be- 
lieve it is safe to assume that the fit by relative residuals would show an 
asymptotic population not very different from 200 and decidedly less than 


























375.5. 

© If we note that 6(1/P) = — 6P/P? and if we may assimilate E — P toa 
differential, we could consider the problem of fitting by absolute residuals to 
be the minimization of S = X(1/P — 1/E)°E* and by relative residuals to be 
the minimization of S = X(1/P — 1/E#)?E?. These expressions for S are linear 
in two of our variables, a and b and the computation form for minimizing S is 
considerably simpler than that we have used (Art. 31). What we have said 
about the logarithmic fit (Art. 25) applies,—the quantities may not strictly 
be assimilated to infinitesimals and the fit on this assumption is in some cases 
not particularly close to the true fit by absolute residuals. For the sake of 
getting a first approximation to the fit by absolute residuals we do not believe 
that this method would be notably superior to that based on (31) which is 








LEAST SQUARES AND LAWS OF POPULATION GROWTH 373 


simpler. The conditions for relative residuals are as a matter of fact identical 
with (16b) and it is this form we have used in fitting logistics by relative residu- 
als; on the other hand in fitting augmented logistics (Arts. 54 and 55) we based 
our computation form on (16a) because it was simpler than the one based on 
(16b). 

6. Figures for populations taken from Monk and Jeter, J. Amer. Statist. 
Assoc., 23, 1928, pp. 361-385; we shall have occasion to refer to results taken 
from other parts of this paper. 

62 In such cases it is necessary to go back to general exploratory methods. 
With experience one learns to get valuable indications of the direction in 
which the least squares solution lies even from the failure of the least squares 
procedure to converge. We have often found, particularly in the case of the 
augmented logistic, that the constants of the trial graphic solution must be 
decidedly nearer to those of the least squares solution than their standard 
errors if the least squares procedure is to converge. 

63 At least as early as 1922 (Proc. Nat. Acad. Sci., 8, pp. 365-368, 1922) they 
were using the augmented form. They returned to the simple form in 1927. 
They seem often to be considering the possibility of using (Pearl, Biology of 
Population Growth, see p. 17) 


P=d+ 


k 


L + me? + G2? ast! 





But in his Medical Biometry and Statistics, 2nd. Ed., 1930, Pearl works 
exclusively with the augmented logistic (Chap. XVII) which he calls the 
logistic—which it is geometrically as a curve but which it is not as a “law.” 

64 The statement in this form assumes what is the natural assumption of 
any cycle theory namely that one cycle runs its course to or near to its appro- 
priate upper asymptote d and the subsequent cycle begins from or from near 
the lower asymptote d. The condition of non-realization of the upper or 
lower asymptote complicates the interpretation of the cycle theory and makes 
the existence of the asymptotes or of cycles problematical. Possibly as 
Hotelling’ suggests the real reason for the additional constant is to get greater 
fittability. 

65 In respect to the interpretation of the constants it is necessary to note 
that the augmented logistic may be written identically® 











hende ee ye iin 
= ¢ = , 
1 + me" 1 + men 1 + m’e-”'! 
so that d’ = d+ AK, K’ = — K, m’ = 1/m, n’ = — n and 7?’ = 1/r, where r 


= log n. If, in practice, we never found anything but hypomalthusian aug- 
mented logistics the two forms could be kept separate and the first would be 
used with all the parameters (except possibly d) positive; it is clear, however, 
that the hypercritical case becomes confused—indeed we can always have 
n > O and abolish the hypercritical case if we choose. 

66 Equations are given in Pearl’s Medical Biometry and Statistics, 2nd Ed., 











374 WILSON AND PUFFER 


p. 424. These seem to be based on (24), and are preferable, at least in most 
cases, to the longer equations based on (25). They are, however, based on a 
form of the logistic which has a discontinuity (infinite values) of two of the 
unknowns as the curve passes through the Malthusian configuration. 

67 We have identically, if 1/A = C +e, 


LTALTA(E—Em) 
k>A?—(ZA)? 





B 2 
S=5 ] _ =>(EK-— m)” y ~~ L4m 
| « + ———— reer E | E-—E,,)? +k i Ent 














>A SA(E — k>A E- Em 
on — |[ B- : 
kEA? — (SA)? ~ EA? — (ZA)? 
Dy, 1 ( m 4m 
at. [ a! 1(E — je DA(E — Ep) 
k>A? — (LA 2— (ZA)? 


Since the quadratic form consisting of the 2nd, 3rd, and 4th terms is definitely 
positive (provided A varies at all with ¢, as it must if nm + 0) we can minimize 
S only by putting these terms equal to zero which gives (37) and (36). If 
then we find C and n (which enter into A) so as to minimize (36) and then take 
B and d from (37) we have the solution of (35). We do not get the solution 
that way but by a least squares procedure on all four variables. However, if 
we wish to take S larger than its minimum and discuss the locus S = const. 
we can use these formulas. We find the values of C and n which give (36) the 
desired value greater than its minimum and then by virtue of the identity we 
can find from (37) the values of B and d which do not change this value of S 
(See Art. 53). 

68 The solution is of course biologically inappropriate in that the population 
goes to infinity in finite time, and this will be true in any critical case; but if 
we do not try to use the equation for long range forecasting it may be consid- 
ered quite satisfactory biologically in that it shows that conditions in Germany 
have been such as to give no indication of approaching saturation insofar as 
that could be indicated by fitting an augmented logistic (and we may recall 
that the simple logistic was hypermalthusian). What we really desire to 
emphasize by expressing doubt as to the appropriateness of the solution is 
the possible if not probable mathematical inappropriateness of taking as the 
mathematical solution of a problem in curve fitting a real solution obtained 
by a procedure of successive approximations when the actual solution is 
imaginary. Such questions arise because of the non-linear form in which the 
constants enter our empirical expressions. We have tried to illustrate these 
difficulties by reference to the imperfect form in which the discussion of least 
squares is given in the books, by showing that the least squares solution in 
simple fictitious examples may depend upon the form of the empirical ex- 
pressions, and by showing that this sort of consideration is forced on us in the 
discussion of the (augmented) logistic theory of the population. Unless a 
student is enough of a mathematician to perceive such difficulties let alone 
being enough of a mathematician to get results right in cases where such 


















LEAST SQUARES AND LAWS OF POPULATION GROWTH 375 


difficulties do not arise, he can have little claim to real competence in this 
line of work. 

69 The least squares fit to the data from 1861 to 1910 is d = 19.671, a = 
— .024557, b = .078826, n = .10709, S = .35357. The lower asymptote is 
about 20 and the curve becomes infinite in finite time. For Pearl’s solution 
S = 1.0568 which is about three times as large as for ours. If one is to main- 
tain that the best way to fit the augmented logistic is by least squares with 
absolute residuals he must prefer our solution to Pearl’s, and to interpret it as 
showing that the period 1861-1910 indicates no saturation as yet in sight for 
the German population. The least squares fit to the data from 1816-1855 is 


d = 14.903, a = .0389909, 6b = .060255, mn = .54480, S = .059754 


The lower asymptote is not much less than that of the period 1861-1910, the 
upper asymptote is d + 1/a = 39.96. For Pearl’s fit S = .3858 or more than 
six times that for the least squares fit. If we are to interpret these figures on 
the cycle theory we have to state that from 1816 to 1855 Germany was grow- 
ing on one cycle based on a lower asymptote of 15 million and approaching an 
upper asymptote of 40 to which it had approximated with the figure 36 by 
the latter date. From 1861 on it was growing on a cycle based on a lower 
asymptote not of 40 (the upper asymptote of the previous cycle), not even on 
anything approaching the population 36 actually attained, but on one of 19.7 
(certainly not significantly greater than 15) and was growing with less than no 
saturation in its second cycle. Such results would seem to leave Pearl’s cycle 
theory of Germany with much to be desired—unless one should insist that 
his values of the constants obtained one knows not how and giving a relatively 
poor fit have a greater scientific validity than the least squares constants. 

70 As seven places are lost in computing the determinant D in the denomi- 
nator we cannot, even with 10 place work, claim any precision in these standard 
errors. 

7 Yule,® while warning against the danger of distant extrapolation, points 
out that with Pearl’s equation the population in 1751 would have been almost 
exactly 5, as contrasted with 4.2 for one of Yule’s logistics and with 6.3 as 
estimated for that time. If we use our least squares fit for extrapolation we 
find for 1751 the population 3.4 which is worse, not better, than the figure 4.2 
or 4.3 given by a variety of good logistics, just as the forecast of the population 
of England and Wales based on the asymptote 383 is (presumably) worse than 
that on the asymptote 107. Thus at both ends the augmented logistic fitted 
by least squares with absolute residuals (the method given by Pearl® in what 
is perhaps his latest, 1930, pronouncement on the subject) is worse than the 
logistic fit, and much worse than the particular augmented logistic which 
Pearl has given without statement of the method by which it was obtained 
and which is comparatively a poor fit to the censuses available (1801-1911). 
What does this mean for the scientific methodology of the subject of this paper? 
And why should Yale extrapolate a four-constant curve which fitted (S = .6) 








376 WILSON AND PUFFER 


worse than his three-constant fits? The sorts of values of the constants which 
are compatible with a fit as bad as S = .6 are treated in Art. 53. 
* For the Three States we found graphics as follows: 


P=d+Cet, d=—.8 C=1.66, n=.172, S =.47 
P=d+(a— i)", d= —.7, a=.131, B = .00656, S = .39 


The value of S for the logistic (Art. 39) is .64. We are unable to build up 
much confidence in the reliability of the logistics as forecasters from such a 
welter of contradiction (or instability) as is revealed here or in the case of 
England and Wales. 

’$ We do not understand why Reed and Pearl should characterize this as 
small; it is larger than any they give; possibly they obtained a value of d very 
different from ours as is the case with many of their other values. 

74 Wilson and Luyten might not have written their article’? on New York if 
the Pearl-Reed values had been right, large as they were, and it is certain 
that they could not have written it if Pearl and Reed had obtained the right 
values. The main fact that must strike the student of populations is that the 
1920 population of 106 for the U. S. A. and a forecasted population of less 
than 200, forms too great a contrast with the 1920 population of 9 for Greater 
New York and a forecasted population of 35. There is presumably little 
validity in any forecast but figures around half of those obtained by Pearl 
and Reed seem more reasonable and it may be some merit (in this special case) 
that their method really gives the smaller figures. 

75 Monk and Jeter® seem to favor estimates of public service corporations 
with higher asymptotic values than those given by logistics (simple or aug- 
mented). They are entitled to their opinion; the discussion they gave of the 
Chicago region was scientific. When they conclude from that discussion that 
the logistics give poor forecasts we agree with them and the 1930 census con- 
firms them. But we see no reason to believe that there is anything in the 
application of the logistics themselves which would indicate any greater de- 
gree of inapplicability than in other cases. Not only are the standard errors 
of the constants of normal size but the actual fit has a standard deviation o’ 
(error to be feared) of around .05 in the table of Art. 60 (the 1920 population 
being about 3.8) whereas New York City and Environs has o’ = .21 for the 
comparable period of time 1850-1920 (the 1920 population being 9) and o’ = 
.15 for the whole period 1790-1920. Based on the terminal populations the 
errors to be feared are 1.3% for Chicago and 2.3% or 1.7% for New York 
City according as we use the comparable or the whole period of time. In 
developing a scientific method we are not at liberty to make a special explana- 
tion or a special adjustment for each special case, for thus we should lose all 
hope of the method being scientific and lapse back upon an art. Our chief 
interest here is to examine the logistic (simple or augmented) as a method 
and its breakdown in the case of Chicago as indicated by the enumerations of 
1930 is to our way of thinking an indication that the method is not sound. 





LEAST SQUARES AND LAWS OF POPULATION GROWTH 3dd 


We have not overlooked the comments of Pearl and Reed, J. Amer. Statist. 
Assoc., 24, 1929, pp. 66-67, which, in view of the results of Art. 55, perhaps 
need not be taken very seriously. 

76 It must be borne in mind that fitting by absolute residuals tends to make 
the absolute residuals independent of the population and to make the relative 
residuals larger for the smaller populations. So fitting by relative residuals 
tends to make the absolute residuals less for the smaller than for the larger 
populations. These tendencies may not be in evidence in a particular case. 
It is, of course, not necessary that we should adopt a system of weights which 
would on the whole make the corresponding errors trendless (for if we had 
real evidence as to variability we should use it for an a priori determination 
of the proper system of weighting), but from an a posteriori point of view that 
is about all we can do. 

‘7 To advocate a complicated mathematical technique, to fail to follow it, 
and yet to get better forecasts that could have been had by following it—that 
is not science but Magic! We do not wish to underestimate the importance 
of magic in human affairs. To some our final conclusions (Art. 61) may seem 
from a practical point of view to be an advocacy of magical practice in the use 
of the logistic in studies of population or other growth, but we should beg to 
differ! 

78Only the ratio ~/k —3:k — 4 is involved due to the increase of the 
number of constants. The large increase in the standard errors of the un- 
knowns must therefore be due chiefly to the fact that A;; : D is much larger 
for the augmented logistic than for the logistic. 

79 For Scotland, Pearl (Human Biology, p. 617) gives 

d = .178, a = .1240, 6 = .5697, n = .1636, S = .076 





We may offer a series of graphical fits with S only about two-thirds as great: 
d = .213, a=.130, b =.565, n =.1629, S = .050 


d = .0, a= .120, b= .490, n = .1495, S = .051 
d = — .30, a=.110, b= .408, n= .1361, S = .049 
d = — .50, a=.100, 6b = .3872, n = .1253, S = .05l 
d = — .80, a = .090, b = .3822, n =.1130, S = .050 


Just where the least squares solution would be found we have not determined, 
but the table shows the instability of the constants. With about equal pre- 
cision of fitting the lower asymptote may be taken between the limits +.21 
and —.80, the upper asymptote between 7.9 and 10.3 and the rate of increase 
between .163 and .113 provided the constants which are highly correlated are 
properly adjusted to one another. 

%® The quantities 5;; and 6;, are uncorrelated if 7;, = 0 but the products 
5;:5jn2, 5;:5;,5;9 need not therefore vanish when averaged over 7; we used the 
term “independent” to indicate that as many of these higher products as 
interest us shall also vanish. 































378 WILSON AND PUFFER 


8t From early days Edgeworth was pointing out by illustrations that as n 
increased the mean of a sample of n tended to lie on a normal curve. See the 
Introductory Description to his collection of papers®® with references to a 
date as early as 1888. The theorem was given a satisfactory mathematical 
formulation only by Isserlis, J. Roy. Statist. Soc., London, 81, 1918, pp. 75-81, 
which was the stimulus for a further contribution by Edgeworth, Ibid., pp. 
624-632. 

® If, for instance, the values 7 = 1, ---,n correspond to n values of f 
which ranges from ¢y to ¢; with a frequency P(t) and if the particular set of n 
values be a fair sample we should have approximately 


ty ty 
yi ¢i7o;? = nf g(tye(t)P(tdt, Tigins = nf y(t) u3(t)P(t)dt, 
0 0 


etc. It is by no means necessary that the range be fixed. For example, if we 
fit y = ax to a set of values of y assigned at x = 1, 2, ---, nm we have 


Li¢gire;? = Zits*o5"*, Li Girusi = Li ts?pusi;, etc. 
Now if o;? = o? and y3; = Bo*® are independent of 7, these sums vary with n 
chiefly as n’ and as n‘, etc., and we have 


, ni _ 2 
Bo = C, 9/9? Bo —-3 = C2— 
n?!2 n 
where C; and C2 are constants except for powers of 1/n. Thus 8. and Bo — 3 
still vary chiefly as 1/n'/? and 1/n although the range increases with n. How- 
ever, the value of co,” will in this case vary as 1/n’ and as the range increases 
o, Will drop off much faster than 1/n'/?. If the standard deviations o;? varied 
as 7? and ys; as 73, etc., we should have 
: n° , n? 
‘= Co Ba = C ly? B2a — 3 = C,— 


~~? . on 2 








and the variation is as in the simplest case. (But if o;? varies we should do 
better to reduce the observations with weights; see infra.) 

83 See for example, E. B. Wilson, Gibbs’ Vector Analysis, Chap. V for the 
ease k = 3. If the vectors be written in terms of k units as 


od. = Gii Ui + Gaia + °++ + gEiUk 












Q; = AyUy + GjoUe + +++ + AU, 





and the law of scalar multiplication u,.u, = 1, u,.u, = 0, h + g, we have 






Bj - Gi = Ajpgiui + Ajegai +--+ + Ajegei 
which is a scalar linear in the k unknowns aj, @j2, ---, aj, and dependent on 7 
through the k values of ¢ which involve 7 with i = 1, 2, ---,nn>k. These 
laws of multiplication are equivalent to those of matrix analysis, though the 
notations may differ in different presentations of the subjects as in M. Bocher’s 
Introduction to Higher Algebra or A. N. Whitehead’s Universal Algebra, Vol. 
I (all that was published of that treatise). 












LEAST SQUARES AND LAWS OF POPULATION GROWTH 379 


If ab, cd are dyads the double multiplication ab :cd means the scalar 
(a.c)(b.d). The notation abc : YQ, however, when a, b, c, are vectors 
and ®, &, Q are dyadics means the triadic (a. ®)(b. H)(e. Q). 

85 The meaning will perhaps be clearer if we write the expression out. 
There will be k® moments up-: of the third order each being the mean of a 
product dap da, da; of the variations of three specified components of a from 
their means, but some of the components may be repeated since p, 7, ¢ are 
not restricted to be different. If now ®"! = rC,,u,u, 


PId ih = FC,C,.C.U,UUuuu, 


the quantities Cp, are the mean products I1,, of the second order divided by o?. 
Also 
bo ,  e—_- > 7 ; ; 
aiPiPi>i = ~gsuPgiPsiG~uil UU, 
then 
Msa = Ms DprtU pU Us| Z geuC pg reC tu iGqiGsigusl 


or the third moment 
Mprt = MeLeeul vel rsU tu iPeiPei Pui 


It does not appear that in general there will be any reduction in this expression 
because the sum over the ¢’s is cubic in the g’s whereas the C’s are the quo- 
tients of cofactors of a determinant by the determinant itself, each term of 
which is a sum quadratic in the g's, viz., Cpg = Ligpiggi. The values of upr 
are the same if the values of p, r, ¢ are unchanged except for permutations. 
There are k moments of the type uppp and k(k — 1) of the type upp, and k(k — 1) 
(k — 2)/6 of the type up,: with the letters p, r, ¢ having distinct values which 
makes in all k(k + 1)(k + 2)/6 different moments to be computed. 
86 By definition us. is made up of tetrads of the type 





6a,6a,6a,6a,U,U,U;U,, 


where the bar denotes that the expression is to be averaged for all samples. 
There are k fourth moments obtained by averaging (éa,)*, there are k(k — 1) 
arising from (éa,)*éa,, there are 3k(k — 1) from (éa,)*(6a,),? and 3k(k — 1) 
(k — 2) from (éa,)*5a,da,, finally k(k — 1)(k — 2)(k — 3)/24 of the type 
6a,6a,6a,6a, with p, r, t, v all different. This makes k(k + 1)(k + 2)(k + 3)/24 
in all. The values are, for the normal distribution, 


‘ Qn 3 ee )» 2 _ 
Mun => 30;}, Min2 = 3? 120 1°02, Mi2 = (1 + 2) 12 Jor 02", 
ton ‘). e 92 
Mi23 = (123 + 27 12°13) 01 0203, 
éi234 = (112% 34 + 1i3%e4 + M141 23) 01020304, 


where the symbols o; refer to the standard deviation of a; and 7;; to the corre- 
lation of a; and a;. [The only value hard to obtain is that of u:234 and that 
becomes simple if it is observed that the mean of x,rersry is one third of 
the mean of the sum of the three equal quantities (x,22)(2sr4), (2123) (eX), 
(212'4)(Xe%3) Which may be found by the usual process of differentiating under 








380 WILSON AND PUFFER 


the sign in the generalized probability integral.] Although there are only 
k(k + 1)(k + 2)(k + 3)/24 different fourth moments, there are of course k‘ 
different tetrads u,u,u,U, in the tetradic sum pea. The value of the fourth 
moment uprty is the same for any given set of p, 7, t, v, but there will be a 
number of tetrads with the units u,, U,, U;, U, arranged in a variety of 
orders. Thus there must be for 


Mpppp, k tetrads; upppr, 4 & k(k — 1) tetrads; 
Mpptty 6 X Zk(k — 1) tetrads; pepper, 12 & Ek(k — 1)(k — 2) tetrads; 
Mprtry 24 X k(k — 1)(k — 2)(k — 3)/24 tetrads; 


and adding we find that there are k* tetrads accounted for. 
NOW pa = Léapsa-U,U, = Typ U,U, with up, = urp. The product peatiea has 
the following types of terms where only the indices of the units u are given 


mir(1111), A terms; piimoe(1122 + 2211), 3k(k — 1) terms; 
mie’(1212 + 1221 + 2112 + 2121), 3k(k — 1) terms; 
Myyai2(1112 + 1121 + 1211 + 2111), k(k — 1) terms; 
Mirmes(1123 + 2311 + 1132 + 3211), 3k(k — 1)(k — 2) terms; 


+ 1312 + 1321 + 3112 + 3121), 3k(k — 1)(k — 2) terms; 


Miouaa( 1234 + 2134 + 1243 4+ 21438 
+ 3412 + 3421 + 4312 + 4321), &k(k — 1)(k — 2)(k — 3) terms. 


There are k* such individual tetrads in the product poauec. With the inter- 
changes of units corresponding to abcd + achd + adcb we find that the 
product defined as pa X uo, Which must contain 3k‘ tetrads, has these types 


3ui2(1111), terms; 3y:12(1112 + 1121 + 1211 + 2111), k(k — 1) terms; 
(Qpyo? + pwrsymer) (1122 + 2211 + 1221 + 2112 + 2121 + 1212), 4k(k —1) terms; 


3121 + 1123 + 2311 + 1132 + 3211), 3k(k — 1)(k — 2) terms; 


j2@34(1234 and all permutations thereof), 4k(k — 1)(k — 2)(k — 3) terms. 


It is clear that ua X pa has complete permutability of the units in each of the 
term types. Now as yu); is a1! and piu iS T120;°02, etc., it is clear that usa and 
tioa X Mog are identical, as stated, when the distribution is normal. 

(The only identification which is not perfectly obvious is the last involving 
tose «We have k(k — 1)(k — 2)(k — 3)/8 coefficients of the form pyou34 each 
multiplied by 24 tetrads; we have to collect according to the tetrads. Con- 
sider U,U.U;U,y. This has pjousy = rieoioorgo304 AS Written. The same tetrad 
will occur with coefficients pj3424 and pi4qu23 and only these.) 

87 The theorem holds under assumption iv, namely, that r;, = 0. If the 





LEAST SQUARES AND LAWS OF POPULATION GROWTH 381 


variations 6 in the different universes are correlated the relation of the weights 
to the values of « would be much more complicated and would involve the r’s. 
Even in the simplest case, n = 2, k = 1, the minimization of 


oa = [wero + wegror + 2rwiwegigeeie2)/(Wigr -+ Wage)? 
leads to 





Wee" 1 — rpgice/g201 Wee2g2 = 2/02 — Ti2¢1/o1 








‘ 9 - P / an / } 
Wid; 1 — 112920 1/ G12 W1i01¢1 $1/01 — 112¢2/02 


88 Thus a tolerable smooth curve defined by statistical data may appear 
very rough if one tries to estimate its smoothness from third or fourth differ- 
ences in the data. Such arithmetical roughness is presumably Just as imagi- 
nary or unreal so far as concerns the phenomenon defined by the curve as 
the imaginary constants for Chicago. 

89 Journal of the Royal Statistical Society, London, December 1898, March, 
June, September 1899. 

*” When fitting n, to (A + Bz,?)—, the constant N = 25878, which is the 
total number of items in the sample, is not introduced explicitly and is not 
reproduced in the result, for if we take the expression for y and integrate from 
— » to + we find the number 36800 which is nearly 50% in excess of N. 
As the theory of x? depends on the distribution of a specified number N of 
items, we can hardly apply x? to compare the actual distribution in the histo- 
gram with the distribution computed from the frequency function here fitted. 

R. A. Fisher,** in discussing efficiency, points out (p. 331) that one should 
minimize the likelihood L = S log f and that the standard deviations of the 
unknowns can be determined (p. 332) from the second derivatives of L. It is 
interesting to apply that method of fitting to the case of 


N c N 1.591 
— , then — . 
rec? + (x — m)? mw (1.591)? + (x — 66.634)? 








with o,. = .0125 and om = .0162. It is noteworthy that as the mean of the 
observations is 66.701 and median is 66.651, the value of m at 66.634 
differs from the mean by four times om. Further, if the standard deviation 
of the mean of the observations is estimated by the usual method we find 
gmean = .0159 which is essentially the same as o,, although if the distribution 
of the universe were actually on the curve used, the sample should not have a 
determinate mean and omean should presumably be much larger. We may 
also fit the same type of curve directly by least squares though the case is 
non-linear, and what we actually did was to minimize (over the range for 
which there were observations) 








(P — E)? : N 1.9786 
p> to find y = — ey : ee 
P a (1.9786)? + (x — 66.630)? 
with o. = .193 and om = .202 when determined by the formulas usually 


applied for least squares. These values are much larger due to the multiplier 





382 WILSON AND PUFFER 


S/(n — 2) in o? where S is the value of x? which for the fit is about 5000 though 
n = 27. Now, is the least squares method worth anything for this case? 
We incline to think that it is not. The effort to minimize x? assumes tacitly 
that the sample varies from the universe by fluctuations which are roughly 
proportional to the square root of the numbers in the different intervals and 
if the application of the method is to be considered justified a posteriori, the 
observed fluctuations P — E must be approximately equal to ./P and x? = S 
must be approximately equal to n, whereas we find the fit so bad that x? 
actually turns out to be 5000. Under these conditions the sample cannot 
arise as a chance variation from a universe of the type fitted and the actual 
value of x? for the sample cannot be put equal to the mean value of x? for all 
samples which could arise by chance fluctuation. 

Finally it may be observed that we could fit the curve by taking into account 
the region beyond that over which the sample is distributed, and probably 
should do so, because one of the facts of the observations is that although N = 
25878 no recruit was found with a height greater than 78” or less than 51” 
and furthermore the distribution tailed off to these figures in such a way as to 
imply that the termination of the distribution was natural and not forced by 
regulations. The results of including the range out to + o make, of course, 
a large difference (between 40°% and 50%) in the value of x? because of the 
inappropriateness of the curve type selected. The result of minimizing x? on 
this hypothesis is 

N 1.8694 
@® (1.8694)? + (x — 66.616)?’ 





Y Cc = .248, cn = .249 


The value of the constant c is materially different, that of m is not much 
different, those of o- and om are larger, but chiefly because x? is larger, an 
increase from around 5000 to around 7400 amounting to one of about 25% 
in ¢, Which by the formula varies as the square root of S. 








